Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative Markov Games

This research paper tackles a fundamental limitation in traditional multi-agent reinforcement learning by recognizing that real-world agents are not risk-neutral rational actors but instead exhibit complex psychological biases when making decisions under uncertainty. The authors introduce Cumulative Prospect Theory (CPT), a Nobel Prize-winning framework from behavioral economics, into the multi-agent reinforcement learning setting to model how agents actually behave when facing risky outcomes - specifically how they exhibit loss aversion (feeling losses more intensely than equivalent gains), reference point dependence (evaluating outcomes relative to some baseline rather than absolute values), and probability distortion (overweighting small probabilities while underweighting moderate ones). The technical contribution centers around Network Aggregative Markov Games (NAMGs), a specialized class of multi-agent systems where each agent's reward depends not on every other agent's individual actions but on an aggregated summary statistic of their neighbors' actions in a network structure, which dramatically reduces computational complexity compared to general-sum games. The core algorithm, called "Distributed Nested CPT-AC," operates on two timescales using an actor-critic architecture where the critic learns risk-sensitive value functions incorporating CPT's probability weighting functions and S-shaped utility curves, while the actor updates policy parameters using gradients of the CPT-transformed objectives, with mathematical guarantees of convergence to a "Subjective Markov Perfect Nash Equilibrium" that accounts for each agent's individual risk preferences. The practical implications are profound - in social network scenarios, agents with high loss aversion parameters become significantly more conservative in their investment decisions, leading to potential social isolation and reduced overall system performance, while probability distortion causes agents to exhibit risk-seeking behavior for low-probability high-reward events and risk-averse behavior for moderate-probability scenarios, fundamentally changing the equilibrium dynamics compared to traditional risk-neutral multi-agent systems.