Joint Optimization of Bandwidth and Power Allocation in Uplink Systems with Deep Reinforcement Learning

Wireless resource utilizations are the focus of future communication, which are used constantly to alleviate the communication quality problem caused by the explosive interference with increasing users, especially the inter-cell interference in the multi-cell multi-user systems. To tackle this inter...

Full description

Saved in:
Bibliographic Details
Published in:Sensors (Basel, Switzerland) Vol. 23; no. 15; p. 6822
Main Authors: Zhang, Chongli, Lv, Tiejun, Huang, Pingmu, Lin, Zhipeng, Zeng, Jie, Ren, Yuan
Format: Journal Article
Language:English
Published: Switzerland MDPI AG 31-07-2023
MDPI
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Wireless resource utilizations are the focus of future communication, which are used constantly to alleviate the communication quality problem caused by the explosive interference with increasing users, especially the inter-cell interference in the multi-cell multi-user systems. To tackle this interference and improve the resource utilization rate, we proposed a joint-priority-based reinforcement learning (JPRL) approach to jointly optimize the bandwidth and transmit power allocation. This method aims to maximize the average throughput of the system while suppressing the co-channel interference and guaranteeing the quality of service (QoS) constraint. Specifically, we de-coupled the joint problem into two sub-problems, i.e., the bandwidth assignment and power allocation sub-problems. The multi-agent double deep Q network (MADDQN) was developed to solve the bandwidth allocation sub-problem for each user and the prioritized multi-agent deep deterministic policy gradient (P-MADDPG) algorithm by deploying a prioritized replay buffer that is designed to handle the transmit power allocation sub-problem. Numerical results show that the proposed JPRL method could accelerate model training and outperform the alternative methods in terms of throughput. For example, the average throughput was approximately 10.4-15.5% better than the homogeneous-learning-based benchmarks, and about 17.3% higher than the genetic algorithm.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1424-8220
1424-8220
DOI:10.3390/s23156822