Assignment: Safe Multi-Agent Reinforcement Learning (MARL) for UAV Swarm Communications | Pervasive Systems group

Safe Multi-Agent Reinforcement Learning (MARL) for UAV Swarm Communications

Problem Statement

This research aims to explore the application of safe Multi-Agent Reinforcement Learning (MARL) in enhancing the performance and security of UAV swarm communications. UAV swarms are increasingly used in various applications such as surveillance, disaster management, and environmental monitoring. However, these systems are vulnerable to adversarial communications, where attackers can send fake learning messages to degrade performance. The study will investigate how safe MARL can be used to select cooperative agents, authenticate learning messages, and optimize communication strategies to improve the robustness and efficiency of UAV swarm communications.

Tasks:

Explore one of the exciting solution below:

Multi-Agent Deep Q-Network (DQN):
Description: Multi-Agent DQN extends the traditional DQN algorithm to multi-agent settings, where each agent learns to optimize its policy based on shared observations and Q-values from neighboring agents.
Application: This algorithm can be used to optimize power allocation and channel selection in UAV swarm communications, enhancing the overall
Soft Actor-Critic (SAC):
Description: SAC is an off-policy algorithm that maximizes the entropy of the policy to encourage exploration. It uses both policy and value networks to learn optimal actions.
Application: SAC can be applied to UAV swarm communications to improve the robustness and efficiency of anti-jamming strategies by enabling UAVs to explore diverse transmission policies
Proximal Policy Optimization (PPO):
Description: PPO is a policy gradient method that uses a clipped objective function to ensure stable and efficient policy updates. It is suitable for environments with continuous action spaces.
Application: PPO can be used to optimize UAV trajectories and communication strategies, ensuring stable learning and adaptation to dynamic network conditions
Multi-Agent Advantage Actor-Critic (A2C):
Description: A2C combines the advantages of actor-critic methods with multi-agent settings, where each agent learns both the policy (actor) and value function (critic) based on shared information.
Application: This algorithm can enhance cooperative decision-making in UAV swarms, improving the overall network performance and resilience against adversarial communications.
Multi-Agent Reinforcement Learning via Shielding:
Description: Shielding approaches synthesize shields to monitor agents' actions and correct unsafe actions. Centralized shielding monitors all agents jointly, while factored shielding uses multiple shields for subsets of agents.
Application: Shielding can be used to ensure safety in UAV swarm communications by preventing unsafe actions and enhancing the robustness of MARL algorithms.
Scalable Safe MARL (SS-MARL):
Description: SS-MARL leverages the graph structure of multi-agent systems to aggregate local observations and communications. It uses constrained joint policy optimization to improve safety and scalability.
Application: SS-MARL can be applied to large-scale UAV swarms to enhance communication efficiency and safety, ensuring optimal performance in complex environments.

Work:

10% Theory, 70% Simulations, 20%Writing

Contact:

Alessandro Chiumento (a.chiumento@utwente.nl)