XÂY DỰNG HỆ THỐNG CHATBOT HỎI ĐÁP TRÊN TÀI LIỆU KỸ THUẬT SỬ DỤNG RETRIEVAL-AUGMENTED GENERATION

Đặng Khánh Hòa, Phạm Thị Thu Trang; Phạm Minh Việt, Nguyễn Thanh Nghị

doi:10.59266/houjs.2026.1173

Authors

Đặng Khánh Hòa, Phạm Thị Thu Trang
Phạm Minh Việt, Nguyễn Thanh Nghị

DOI:

https://doi.org/10.59266/houjs.2026.1173

Keywords:

tạo sinh truy hồi tăng cường, truy hồi dày đặc, tái xếp thứ hạng, tài liệu kỹ thuật, mô hình ngôn ngữ lớn

Abstract

Bài báo trình bày quá trình thiết kế và hiện thực một hệ thống chatbot hỏi đáp trên tài liệu kỹ thuật dựa trên kiến trúc Retrieval-Augmented Generation (RAG) nhằm giảm thiểu hiện tượng ảo giác (hallucination) và tăng cường khả năng truy vết nguồn thông tin. Hệ thống được xây dựng theo kiến trúc hai pha tách biệt: pha ngoại tuyến bao gồm nạp dữ liệu, làm sạch, phân đoạn văn bản, sinh vector nhúng và lập chỉ mục; pha trực tuyến thực hiện truy xuất ngữ nghĩa, xếp hạng lại, xây dựng ngữ cảnh và sinh câu trả lời thông qua mô hình ngôn ngữ lớn. Nghiên cứu triển khai và so sánh hai chiến lược phân đoạn (theo ký tự và theo cấu trúc markdown), ba chế độ truy xuất (dense, keyword, hybrid), mô hình nhúng đa ngôn ngữ BAAI/bge-m3 và mô hình xếp hạng lại BAAI/bge-reranker-v2-m3, đồng thời hỗ trợ hai backend sinh văn bản (Qwen2.5-7B-Instruct cục bộ qua Ollama và Gemini 1.5 Flash đám mây). Trên bộ kiểm thử gồm 25 câu hỏi có gán nhãn về tài liệu viễn thông 5G, cấu hình dense retrieval đạt Hit@1 = 84% và MRR = 0,913; khi bổ sung bước reranking, Hit@1 tăng lên 92% và MRR tăng lên 0,960 trong khi Hit@3 duy trì ở mức 100%. Kết quả chứng minh rằng kết hợp truy xuất ngữ nghĩa dày đặc với xếp hạng lại Cross-Encoder là hướng tiếp cận hiệu quả cho bài toán hỏi đáp theo miền; đồng thời cơ chế file-based cache và fallback backend giúp hệ thống có tính khả thi cao trên hạ tầng phần cứng phổ thông.

References

Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to retrieve, generate, and critique through self-reflection.

arXiv. https://arxiv.org/abs/2310.11511

BAAI. (n.d.-a). bge-m3. Hugging Face. https://huggingface.co/BAAI/bge-m3

BAAI. (n.d.-b). bge-reranker-v2-m3. Hugging Face. https://huggingface.co/BAAI/bge-reranker-v2-m3

Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., Hui, B., Ji, L., Li, M., Lin, J., Lin, R., Liu, D., Liu, G., Lu, C., Lu, K., ... Zhou, J. (2023). Qwen technical report. arXiv. https://arxiv.org/abs/2309.16609

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., ... Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems, 33, 1877-1901. https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a- Abstract.html

Karpukhin, V., Oguz, B., Min, S., Lewis, P.,Wu, L., Edunov, S., Chen, D., & Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6769-6781). Association for Computational Linguistics. https:// aclanthology.org/2020.emnlp- main.550/

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems, 33, 9459-9474. https://arxiv.org/abs/2005.11401

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv. https://arxiv.org/abs/2312.10997

Nogueira, R., & Cho, K. (2019). Passage re- ranking with BERT. arXiv. http://arxiv.org/abs/1901.04085

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP) (pp. 3982-3992). Association for Computational Linguistics. https:// arxiv.org/abs/1908.10084

Shi, W., Gururangan, S., & Zettlemoyer, L. (2024). REPLUG: Retrieval-augmented language model pre-training. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024) (pp. 6564- 6575). Association for Computational Linguistics. https://aclanthology.org/2024.naacl-long.361

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and efficient foundation language models. arXiv. https://arxiv.org/abs/2302.13971

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, 30. https://papers.nips.cc/paper/7181-attention-is-all-you-need.