TỐI ƯU HÓA THU THẬP DỮ LIỆU IOT BẰNG UAV SỬ DỤNG HỌC TĂNG CƯỜNG SÂU ĐA TÁC TỬ VỚI PHÂN PHỐI THỜI GIAN GAUSSIAN

Hoàng Trọng Nghĩa

doi:10.59266/houjs.2025.741

Các tác giả

Hoàng Trọng Nghĩa

DOI:

https://doi.org/10.59266/houjs.2025.741

Từ khóa:

UAV, IoT, học tăng cường sâu đa tác tử (MADRL), MADDPG, phân công nhiệm vụ, thu thập dữ liệu, điện toán biên, phân phối Gaussian

Tóm tắt

Nghiên cứu này đề xuất một phương pháp mới dựa trên học tăng cường sâu đa tác tử (MADRL), cụ thể là thuật toán Multi-Agent Deep Deterministic Policy Gradient (MADDPG), để giải quyết bài toán phân công nhiệm vụ và tối ưu hóa quỹ đạo bay cho các phương tiện bay không người lái (UAV) trong việc thu thập dữ liệu từ các thiết bị Internet of Things (IoT) phân tán. Mô hình xem xét thời hạn thu thập dữ liệu từ các nút IoT tuân theo phân phối Gaussian và khả năng xử lý dữ liệu tại biên (edge computing) của UAV. Mục tiêu chính là tối thiểu hóa tổng năng lượng tiêu thụ của hệ thống UAV, tối đa hóa số lượng dữ liệu thu thập được và đảm bảo phân bổ nhiệm vụ đồng đều giữa các UAV. Kết quả mô phỏng cho thấy MADDPG vượt trội đáng kể trong việc giảm thiểu tiêu thụ năng lượng và cân bằng tải, đồng thời chỉ ra tiềm năng lớn khi được huấn luyện và tinh chỉnh tối ưu.

Tài liệu tham khảo

[1]. Messaoudi, K., Oubbati, O. S., Rached, A., Lakas, A., Bendouma, T., & Chaib, N. (2023). A survey of UAV-based data collection: Challenges, solutions and future perspectives. Journal of Network and Computer Applications, 213, 103670. https://doi.org/10.1016/j. jnca.2023.103670

[2]. Zeng, Y., Zhang, R., & Lim, T. J. (2016). Wireless communications with unmannedaerialvehicles: Opportunities and challenges. IEEE Communications Magazine, 54(5), 36–42. https://doi. org/10.1109/MCOM.2016.7470933

[3]. Zhang, L., He, C., Peng, Y., Liu, Z., & Zhu, X. (2023). Multi-UAV data collection and path planning method for large-scale terminal access. Sensors, 23(20), 8601. https://doi.org/10.3390/ s23208601

[4]. Jang, D., Yoo, J., Son, C. Y., Kim, H. J., & Johansson, K. H. (2019). Networked operation of a UAV using Gaussian process-based delay compensation and model predictive control. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA) (pp. 2741–2747). IEEE. https:// doi.org/10.1109/ICRA.2019.8793472

[5]. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems (NeurIPS) (pp. 6379–6390).

[6]. Wang, Y., Gao, Z., Zhang, J., Cao, X., Zheng, D., & Gao, Y. (2022). Trajectory design for UAV-based Internet of Things data collection: A deep reinforcement learning approach. IEEE Internet of Things Journal, 9(5), 3743–3755. https://doi.org/10.1109/ JIOT.2021.3094806

[7]. Tong, P., Liu, J., Wang, X., Bai, B., & Dai, H. (2019). UAV-enabled age- optimal data collection in wireless sensor networks. In Proceedings of the 2019 IEEE International Conference on Communications Workshops (ICC Workshops) (pp. 1–6). IEEE. https:// doi.org/10.1109/ICCW.2019.8757134

[8]. Wang, Y., Chen, M., Pan, C., Wang, K., & Pan, Y. (2022). Joint optimization of UAV trajectory and sensor uploading powers for UAV-assisted data collection in wireless sensor networks. IEEE Internet of Things Journal, 9(13), 10731–10742. https://doi.org/10.1109/ JIOT.2021.3126634

[9]. Wu, Q., Zeng, Y., & Zhang, R. (2018). Joint trajectory and communication design for multi-UAV enabled wireless networks. IEEE Transactions on Wireless Communications, 17(3), 2109–2121. https://doi.org/10.1109/ TWC.2017.2785783

[10]. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.

[11]. Wang, X., Wang, S., Liang, X., Zhao, D., Huang, J., & Xu, X. (2024). Deep reinforcement learning for UAV control: A survey. IEEE Transactions on Neural Networks and Learning Systems, 35(4), 5064–5078. https://doi. org/10.1109/TNNLS.2023.3262212

[12]. Li, B., Wang, J., Song, C., Yang, Z., Wan, K., & Zhang, Q. (2024). Multi- UAV roundup strategy method based on deep reinforcement learning CEL- MADDPG algorithm. Expert Systems with Applications, 245, 123018. https:// doi.org/10.1016/j.eswa.2023.123018

[13]. Yu, H., Leng, S., & Wu, F. (2024). Joint cooperative computation offloading and trajectory optimization in heterogeneous UAV-swarm-enabled aerial edge computing networks. IEEE Internet of Things Journal, 11(10), 17700–17711. https://doi.org/10.1109/ JIOT.2023.3314975

[14]. Liu, H., Long, X., Li, Y., Yan, J., Li, M., Chen, C., Gu, F., Pu, H., & Luo, J. (2025). Adaptive multi-UAV cooperative path planning based on novel rotation artificial potential fields. Knowledge-Based Systems, 317, 113429. https://doi.org/10.1016/j. knosys.2025.113429