Previous peak current mode (PCM) control for primary-side regulation (PSR) double-clamp zero voltage switching (DCZVS) flyback converters suffers from poor dynamic performance. The underlying cause of poor dynamic performance is the PSR process, in which the secondary-side output errors cannot be promptly reflected on the primary-side. This paper proposes an action-aware control method, embedding past control decisions within observed states to shorten the PSR process. The proposed controller determines control actions according to the variation trend of the system states under the past actions. The method is developed from the reinforcement learning (RL), which shapes control policies via trial-and-error interactions with environments. The training is conducted in simulation environments and guided by reward sharping tricks. The trained agent is implemented on the FPGA platform via high-level synthesis (HLS) tools. Additionally, the proposed method presents impressive dynamic performance in experiments, reducing the settling time of previous studies by over 50% on a 280W prototype.