This digest proposes a reinforcement learning (RL) approach for online loop compensation tuning in multiphase Buck converters. Unlike fixed-gain controllers, an RL agent dynamically optimizes control parameters using real-time voltage error, current imbalance, and transient metrics. A multi-objective reward function simultaneously enhances regulation accuracy and current sharing. Simulations on a 4-phase Buck model demonstrate that the RL-tuned PI controller outperforms manual methods, reducing the dynamic fluctuations from 30mV to 10mV.