LÀM GIÀU BIỂU DIỄN NỘI DUNG BẰNG TIÊN NGHIỆM THỂ LOẠI: MÔ HÌNH GAGE CHO HỆ THỐNG GỢI Ý

Dương Tấn Nghĩa , Trần Tiến Dũng

doi:10.59266/houjs.2026.1165

Các tác giả

Dương Tấn Nghĩa , Trần Tiến Dũng

DOI:

https://doi.org/10.59266/houjs.2026.1165

Từ khóa:

hệ thống gợi ý, biểu diễn nội dung, tiên nghiệm thể loại, độ tương tự cosin, khử nhiễu

Tóm tắt

Biểu diễn nội dung dạng dày đặc cho phim đóng vai trò nền tảng trong nhiều hệ gợi ý hiện đại, đặc biệt khi mục tiêu không chỉ là dự đoán sở thích mà còn là giải thích được vì sao một mục nội dung được đề xuất. Tag Genome của MovieLens cung cấp một ma trận mức liên quan liên tục giữa phim và nhãn mô tả, nhờ đó cho phép đo độ tương tự nội dung ở mức tinh hơn so với cách gắn nhãn rời rạc truyền thống. Tuy nhiên, các điểm liên quan này vẫn chịu ảnh hưởng của nhiễu, của thiếu bằng chứng quan sát và của bất nhất trong tri thức cộng đồng. Bài báo đề xuất mô hình Genre-Aware Genome Enrichment (GAGE), một cơ chế hiệu chỉnh điểm liên quan dựa trên tiên nghiệm theo thể loại, nhằm khử nhiễu và làm giàu biểu diễn nội dung trước khi đưa vào mô hình gợi ý dựa trên độ tương tự. Phương pháp gồm bốn giai đoạn: xác định ngưỡng thích ứng theo từng nhãn, ước lượng xác suất tiên nghiệm P(nhãn|thể loại), tổng hợp tiên nghiệm đa thể loại cho từng phim, và cập nhật phi tuyến điểm liên quan với cơ chế bảo vệ biên để tránh làm méo các giá trị đã có độ chắc chắn cao. Trên dữ liệu MovieLens 20M sau tiền xử lý với 10.133 phim và 916 nhãn, thực nghiệm với phương pháp lân cận gần nhất k, độ tương tự cosin và sai số căn phương trung bình cho thấy khi tăng cường độ can thiệp theo thể loại, sai số dự đoán có xu hướng tăng so với mốc đối chứng. Kết quả này chỉ ra rằng việc khuếch tăng hai chiều theo thể loại có thể làm đồng nhất hóa các vectơ nội dung và bơm dương tính giả. Từ đó, bài báo đề xuất một định hướng thận trọng hơn: ưu tiên khử nhiễu một chiều thay vì đồng thời vừa thưởng vừa phạt, và đánh giá mô hình theo nhiều mục tiêu gợi ý thay vì chỉ dựa trên một chỉ số duy nhất.

Tài liệu tham khảo

Adamopoulos, P., & Tuzhilin, A. (2014). On over-specialization and concentration bias of recommendations: Probabilistic neighborhood selection in collaborative filtering systems. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys '14) (pp. 153-160). ACM. https://doi.org/10.1145/2645710.2645752

Breese, J. S., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98) (pp. 43-52). Morgan Kaufmann.

Harper, F. M., & Konstan, J. A. (2015). The MovieLens datasets: History and context. ACM Transactions on Interactive Intelligent Systems, 5(4), Article 19. https://doi.org/10.1145/2827872

Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1), 5-53. https://doi.org/10.1145/963770.963772

Kaminskas, M., & Bridge, D. (2017). Diversity, serendipity, novelty, and coverage: A survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Transactions on Interactive Intelligent Systems, 7(1), Article 2. https://doi.org/10.1145/2926720

Kotkov, D., Maslov, A., & Neovius, M. (2021). Revisiting the tag relevance prediction problem. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21) (pp. 1768-1772). ACM. https://doi.org/10.1145/3404835.3463019

Lops, P., de Gemmis, M., & Semeraro, G. (2011). Content-based recommender systems: State of the art and trends. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender systems handbook (pp. 73-105). Springer. https://doi.org/10.1007/978-0-387-85820-3_3

Park, Y.-J., & Tuzhilin, A. (2008). The long tail of recommender systems and how to leverage it. In Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys '08) (pp. 11-18). ACM. https://doi.org/10.1145/1454008.1454012

Pazzani, M. J., & Billsus, D. (2007). Content- based recommendation systems. In P. Brusilovsky, A. Kobsa, & W. Nejdl (Eds.), The adaptive web: Methods and strategies of web personalization (Lecture Notes in Computer Science, Vol. 4321, pp. 325-341). Springer. https://doi.org/10.1007/978-3-540-72079-9_10

Rendle, S., Freudenthaler, C., Gantner, Z., & Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI '09) (pp. 452-461). AUAI Press.

Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web (WWW '01) (pp. 285-295). ACM. https://doi.org/10.1145/371920.372071

Vargas, S., & Castells, P. (2011). Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the 5th ACM Conference on Recommender Systems (RecSys '11) (pp. 109-116). ACM. https://doi.org/10.1145/2043932.2043955

Vargas, S., Baltrunas, L., Karatzoglou, A., & Castells, P. (2014). Coverage, redundancy and size-awareness in genre diversity for recommender systems. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys '14) (pp. 209-216). ACM. https://doi.org/10.1145/2645710.2645743

Vig, J., Sen, S., & Riedl, J. (2012). The tag genome: Encoding community knowledge to support novel interaction. ACM Transactions on Interactive Intelligent Systems, 2(3), Article 13. https://doi.org/10.1145/2362394.2362395

Zhang, S., Yao, L., Sun, A., & Tay, Y. (2019). Deep learning-based recommender system: A survey and new perspectives. ACM Computing Surveys, 52(1), Article 5. https://doi.org/10.1145/3285029

Zhang, Y., & Chen, X. (2020). Explainable recommendation: A survey and new perspectives. Foundations and Trends in Information Retrieval, 14(1), 1-101. https://doi.org/10.1561/1500000066