This paper introduced the first deep learning model capable of learning control policies directly from high-dimensional sensory input using reinforcement learning. The authors demonstrated that a convolutional neural network trained with Q-learning variants could achieve human-level performance on multiple Atari 2600 games using raw pixel inputs. Key innovations included experience replay buffers, target network stabilization, and end-to-end training without handcrafted features. The work laid foundational principles for modern deep reinforcement learning (DRL) and inspired subsequent advances in algorithmic stability and sample efficiency1811.


The DQN architecture processed four 84×84 grayscale frames through:
This hierarchy enabled automatic feature extraction from pixels, eliminating manual engineering.

An ϵ\epsilonϵ-greedy policy (ϵ\epsilonϵ annealed from 1.0 to 0.1) balanced exploration and exploitation: