Acta Informatica Pragensia 2025, 14(3), 365-392 | DOI: 10.18267/j.aip.2633336

Measuring the Feasibility of a Question and Answering System for the Sarawak Gazette Using Chatbot Technology

Yasir Lutfan bin Yusuf, Suhaila binti Saee ORCID...
Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia

Background: The Sarawak Gazette is a critical repository of information pertaining to Sarawak’s history. It has received much attention over the last two decades, with prior studies focusing on digitizing and extracting the gazette’s ontologies to increase the gazette’s accessibility. However, the creation of a question answering system for the Sarawak Gazette, another avenue that could improve accessibility, has been overlooked.

Objective: This study created a new system to generate answers for user questions related to the gazette using chatbot technology.

Methods: This system sends user queries to a context retrieval system, then generates an answer from the retrieved contexts using a Large Language Model. A question answering dataset was also created using a Large Language Model to evaluate this system, with dataset quality assessed by 10 annotators.

Results: The system achieved 55% higher precision, and 42% higher recall compared to previous state-of-the-art historical document question answering while only sacrificing 11% of cosine similarity. The annotators overall rated the dataset 2.9 out of 3.

Conclusion: The system could answer the general public’s questions about the Sarawak Gazette in a more direct and friendly manner compared to traditional information retrieval methods. The methods developed in this study are also applicable to other Malaysian historical texts that are written in English. All code used in this study have been released on GitHub.

Keywords: Historical documents; Old newspapers; Accessibility; Question answering; Artificial intelligence; Retrieval augmented generation; LangChain.

Received: November 20, 2024; Revised: March 5, 2025; Accepted: March 6, 2025; Prepublished online: March 10, 2025; Published: August 19, 2025  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
Lutfan bin Yusuf, Y., & Saee, S.B. (2025). Measuring the Feasibility of a Question and Answering System for the Sarawak Gazette Using Chatbot Technology. Acta Informatica Pragensia14(3), 365-392. doi: 10.18267/j.aip.263
Download citation

References

  1. Adamopoulou, E., & Moussiades, L. (2020). An overview of chatbot technology. In IFIP international conference on artificial intelligence applications and innovations (pp. 373-383). Springer. https://doi.org/10.1007/978-3-030-49186-4_31 Go to original source...
  2. Allam, A. M. N., & Haggag, M. H. (2012). The question answering systems: A survey. International Journal of Research and Reviews in Information Sciences, 2(3), 1-12.
  3. Anand, Y., Nussbaum, Z., Treat, A., Miller, A., Guo, R., Schmidt, B., GPT4All Community, Duderstadt, B., Schmidt, B., & Mulyar, A. (2023). GPT4All: An Ecosystem of Open Source Compressed Language Models. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software, (pp. 59-64). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.nlposs-1.7 Go to original source...
  4. Arcan, M., O'Halloran, R., Robin, C., & Buitelaar, P. (2022). Towards Bootstrapping a Chatbot on Industrial Heritage through Term and Relation Extraction. In Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities (pp. 108-122). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.nlp4dh-1.15 Go to original source...
  5. Auli, M., & Gao, J. (2014). Decoder integration and expected BLEU training for recurrent neural network language models. https://www.microsoft.com/en-us/research/publication/decoder-integration-and-expected-bleu-training-for-recurrent-neural-network-language-models/ Go to original source...
  6. Bengio, Y., Ducharme, R., & Vincent, P. (2000). A neural probabilistic language model. In Advances in neural information processing systems. NeurIPS. https://proceedings.neurips.cc/paper_files/paper/2000/file/728f206c2a01bf572b5940d7d9a8fa4c-Paper.pdf
  7. Bingham, A. (2010). The digitization of newspaper archives: Opportunities and challenges for historians. Twentieth Century British History, 21(2), 225-231. https://doi.org/10.1093/tcbh/hwq007 Go to original source...
  8. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
  9. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T. & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712. https://doi.org/10.48550/arXiv.2303.12712 Go to original source...
  10. Casas, J., Tricot, M. O., Khaled, O. A., Mugellini, E., & Cudré-Mauroux, P. (2020). Trends & methods in chatbot evaluation. In Companion Publication of the 2020 International Conference on Multimodal Interaction (pp. 280-286). ACM. https://doi.org/10.1145/3395035.3425319 Go to original source...
  11. ChatGPT. (2024). ChatGPT. OpenAI. https://openai.com/chatgpt
  12. Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., Reif, E., Du, N., Hutchinson, B., Pope, R., Bradbury, J., Austin, J., Isard, M., Gur-Ari, G., Yin, P., Duke, T., Levskaya, A., Ghemawat, S., Dev, S., Michalewski, H., Garcia, X., Misra, V., Robinson, K., Fedus, L., Zhou, D., Ippolito, D., Luan, D., Lim, H., Zoph, B., Spiridonov, A., Sepassi, R., Dohan, D., Agrawal, S., Omernick, M., Dai, A. M., Pillai, T. S., Pellat, M., Lewkowycz, A., Moreira, E., Child, R., Polozov, O., Lee, K., Zhou, Z., Wang, X., Saeta, B., Diaz, M., Firat, O., Catasta, M., Wei, J., Meier-Hellstern, K., Eck, D., Dean, J., Petrov, S., & Fiedel, N. (2023). PaLM: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(1), 11324-11436.
  13. Colby, K. M., Weber, S., & Hilf, F. D. (1971). Artificial paranoia. Artificial intelligence, 2(1), 1-25. https://doi.org/10.1016/0004-3702(71)90002-6 Go to original source...
  14. Coughlin, D. (2003). Correlating automated and human assessments of machine translation quality. In Proceedings of Machine Translation Summit IX: Papers. ACL.
  15. de Mulder, W., Bethard, S., & Moens, M. F. (2015). A survey on the application of recurrent neural networks to statistical language modeling. Computer Speech & Language, 30(1), 61-98. https://doi.org/10.1016/j.csl.2014.09.005 Go to original source...
  16. Deutsch, D., Bedrax-Weiss, T., & Roth, D. (2021). Towards question-answering as an automatic metric for evaluating the content quality of a summary. Transactions of the Association for Computational Linguistics, 9, 774-789. https://doi.org/10.1162/tacl_a_00397 Go to original source...
  17. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805 Go to original source...
  18. Dong, L., Wei, F., Zhou, M., & Xu, K. (2015). Question answering over freebase with multi-column convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, (pp. 260-269). ACL. Go to original source...
  19. Drori, I., Zhang, S., Shuttleworth, R., Tang, L., Lu, A., Ke, E., Liu, K., Chen, L., Tran, S., Cheng, N., Wang, R., Singh, N., Patti, T. L., Lynch, J., Shporer, A., Verma, N., Wu, E., & Strang, G. (2022). A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences, 119(32), e2123433119. https://doi.org/10.1073/pnas.2123433119 Go to original source...
  20. Dumais, S., Banko, M., Brill, E., Lin, J., & Ng, A. (2002). Web question answering: Is more always better?. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, (pp. 291-298). ACM. https://doi.org/10.1145/564376.564428 Go to original source...
  21. Embeddings. (2024). Embeddings. OpenAI. https://platform.openai.com/docs/guides/embeddings
  22. Ferret, O., Grau, B., Hurault-Plantet, M., Illouz, G., Monceaux, L., Robba, I., & Vilnat, A. (2001). Finding an answer based on the recognition of the question focus. In Proceedings of The Tenth Text REtrieval Conference, TREC. https://hal.science/hal-02458025/ Go to original source...
  23. Fong, T., & Ranaivo-Malançon, B. (2014). Using TEI XML schema to encode the structures of Sarawak Gazette. International Journal of Social Science and Humanity, 5(10), 855-859. https://doi.org/10.7763/ijssh.2015.v5.569 Go to original source...
  24. Gao, J., & Lin, C. Y. (2004). Introduction to the special issue on statistical language modeling. ACM Transactions on Asian Language Information Processing, 3(2), 87-93. https://doi.org/10.1145/1034780.1034781 Go to original source...
  25. Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420. https://doi.org/10.1613/jair.4992 Go to original source...
  26. GPT4All. (2024). GPT4All Python SDK. Nomic. https://docs.gpt4all.io/gpt4all_python/home.html
  27. Haller, E., & Rebedea, T. (2013). Designing a chat-bot that simulates an historical figure. In 2013 19th international conference on control systems and computer science, (pp. 582-589). IEEE. https://doi.org/10.1109/CSCS.2013.85 Go to original source...
  28. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D. D. L., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., Rae, J. W., Vinyals, O., & Sifre, L. (2022). Training compute-optimal large language models. arXiv preprint arXiv:2203.15556. https://doi.org/10.48550/arXiv.2203.15556 Go to original source...
  29. Hovy, E. H., Gerber, L., Hermjakob, U., Junk, M., & Lin, C. Y. (2000). Question Answering in Webclopedia. In Text REtrieval Conference, (pp. 53-56). NIST. Go to original source...
  30. Introduction. (2024). Introduction. LangChain. https://python.langchain.com/docs/get_started/introduction/
  31. Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. D. L., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., & Sayed, W. E. (2023). Mistral 7B. arXiv preprint arXiv:2310.06825. https://doi.org/10.48550/arXiv.2310.06825 Go to original source...
  32. Jin, H., Zhang, Y., Meng, D., Wang, J., & Tan, J. (2024). A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv preprint arXiv:2403.02901. https://doi.org/10.48550/arXiv.2403.02901 Go to original source...
  33. Joshi, V., Peters, M., & Hopkins, M. (2018). Extending a parser to distant domains using a few dozen partially annotated examples. arXiv preprint arXiv:1805.06556. https://doi.org/10.48550/arXiv.1805.06556 Go to original source...
  34. Jurafsky, D., & Martin, J. H. (2024). Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
  35. Kalla, D., Smith, N., Samaah, F., & Kuraku, S. (2023). Study and Analysis of ChatGPT and its Impact on Different Fields of Study. International Journal of Innovative Science and Research Technology, 8(3), 827-833.
  36. Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J. & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. https://doi.org/10.48550/arXiv.2001.08361 Go to original source...
  37. Kokubu, T., Sakai, T., Saito, Y., Tsutsui, H., Manabe, T., Koyama, M., & Fujii, H. (2005). The Relationship between Answer Ranking and User Satisfaction in a Question Answering System. In Proceedings of NTCIR-5 Workshop Meeting, (pp. 1-8). NII.
  38. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. https://doi.org/10.48550/arXiv.1909.11942 Go to original source...
  39. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461. https://doi.org/10.48550/arXiv.1910.13461 Go to original source...
  40. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W-T., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. In 34th Conference on Neural Information Processing Systems, (pp. 9459-9474). NeurIPS.
  41. Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., Chowdhury, T., Li, Y., Cui, H., Zhang, X., Zhao, T., Panalkar, A., Mehta, D., Pasquali, S., Cheng, W., Wang, H., Liu, Y., Chen, Z., Chen, H., White, C., Gu, Q., Pei, J., Yang, C., & Zhao, L. (2023). Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv preprint arXiv:2305.18703. https://doi.org/10.48550/arXiv.2305.18703 Go to original source...
  42. Liu, C. W., Lowe, R., Serban, I. V., Noseworthy, M., Charlin, L., & Pineau, J. (2016). How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. arXiv preprint arXiv:1603.08023. https://doi.org/10.48550/arXiv.1603.08023 Go to original source...
  43. Liu, X., & Croft, W. B. (2005). Statistical language modeling for information retrieval. Annual Review Information Science, Statistics and Probability, 39(1), 1-31. https://doi.org/10.21236/ADA440321 Go to original source...
  44. Liu, Y., Moosavi, N. S., & Lin, C. (2023). LLMs as narcissistic evaluators: When ego inflates evaluation scores. arXiv preprint arXiv:2311.09766. https://doi.org/10.48550/arXiv.2311.09766 Go to original source...
  45. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692 Go to original source...
  46. Luger, E., & Sellen, A. (2016). "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI conference on human factors in computing systems (pp. 5286-5297). ACM. https://doi.org/10.1145/2858036.2858288 Go to original source...
  47. Marietto, M. D. G. B., de Aguiar, R. V., Barbosa, G. D. O., Botelho, W. T., Pimentel, E., Franca, R. D. S., & da Silva, V. L. (2013). Artificial intelligence markup language: a brief tutorial. arXiv preprint arXiv:1307.3091. https://doi.org/10.48550/arXiv.1307.3091 Go to original source...
  48. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In NIPS'13: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, (pp. 3111-3119). NIPS.
  49. Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. arXiv preprint arXiv:2402.06196. https://doi.org/10.48550/arXiv.2402.06196 Go to original source...
  50. Mishra, A., & Jain, S. K. (2016). A survey on question answering systems with classification. Journal of King Saud University-Computer and Information Sciences, 28(3), 345-361. https://doi.org/10.1016/j.jksuci.2014.10.007 Go to original source...
  51. Mistral AI Team. (2023). Mixtral of Experts. Mistral AI. https://mistral.ai/news/mixtral-of-experts/
  52. Nomic AI. (2024). Run Large Language Models Locally. Nomic. https://www.nomic.ai/gpt4all
  53. Nor Azizan, N. A. B., Saee, S. B., & Yusof, M. A. B. (2023). Discovering Popular Topics of Sarawak Gazette (SaGa) from Twitter Using Deep Learning. In International Conference on Soft Computing in Data Science (pp. 178-192). Springer. https://doi.org/10.1007/978-981-99-0405-1_13 Go to original source...
  54. Obligations of the GPL and LGPL. (2024). Obligations of the GPL and LGPL. The Qt Company. https://www.qt.io/licensing/open-source-lgpl-obligations
  55. Ojokoh, B., & Adebisi, E. (2018). A review of question answering systems. Journal of Web Engineering, 17(8), 717-758. https://doi.org/10.13052/jwe1540-9589.1785 Go to original source...
  56. Ong, C. S., Day, M. Y., & Hsu, W. L. (2009). The measurement of user satisfaction with question answering systems. Information & Management, 46(7), 397-403. https://doi.org/10.1016/j.im.2009.07.004 Go to original source...
  57. OpenAI et al. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774 Go to original source...
  58. Paolucci, M., Kawamura, T., Payne, T. R., & Sycara, K. (2002). Semantic matching of web services capabilities. In The Semantic Web-ISWC 2002: First International Semantic Web Conference, (pp. 333-347). Springer. https://doi.org/10.1007/3-540-48005-6_26 Go to original source...
  59. Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). ACL. Go to original source...
  60. Piryani, B., Mozafari, J., & Jatowt, A. (2024). ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2038-2048). ACM. https://doi.org/10.1145/3626772.3657891 Go to original source...
  61. Pricing. (2024). Pricing. OpenAI. https://openai.com/pricing
  62. Pustaka Negeri Sarawak. (2013). e-Sarawak Gazette. WhiteHornbill. https://www.pustaka-sarawak.com/gazette/home.php
  63. Q&A with RAG. (2024). Q&A with RAG. LangChain. https://python.langchain.com/docs/use_cases/question_answering/
  64. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. https://www.mikecaptain.com/resources/pdf/GPT-1.pdf
  65. Ramli, F., Ranaivo-Malançon, B., Chua, S., & Mohammad, M. S. (2017). Comparative Studies of Ontologies on Sarawak Gazette. Journal of Telecommunication, Electronic and Computer Engineering, 9, 61-65.
  66. Retrieval Augmented Generation. (2024). Retrieval Augmented Generation. Databricks. https://www.databricks.com/glossary/retrieval-augmented-generation-rag
  67. Rosita, M. O., Fatihah, R., Nazri, K. M., Yeo, A. W., & Tan, D. Y. (2010). Cultural Heritage Knowledge Discovery: An Exploratory Study of the Sarawak Gazette. In 2nd Semantic Technology and Knowledge Engineering Conference (pp. 20-27). UNIMAS.
  68. rlm. (2023). rag-prompt. LangChain. https://smith.langchain.com/hub/rlm/rag-prompt
  69. Roy, P. K., Saumya, S., Singh, J. P., Banerjee, S., & Gutub, A. (2023). Analysis of community question-answering issues via machine learning and deep learning: State-of-the-art review. CAAI Transactions on Intelligence Technology, 8(1), 95-117. https://doi.org/10.1049/cit2.12081 Go to original source...
  70. Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108 Go to original source...
  71. Santos, J., Rodrigues, J. J., Casal, J., Saleem, K., & Denisov, V. (2016). Intelligent personal assistants based on internet of things approaches. IEEE Systems Journal, 12(2), 1793-1802. https://doi.org/10.1109/JSYST.2016.2555292 Go to original source...
  72. Shanahan, M. (2024). Talking about large language models. Communications of the ACM, 67(2), 68-79. https://doi.org/10.1145/3624724 Go to original source...
  73. Shawar, A., & Atwell, E. S. (2005). A chatbot system as a tool to animate a corpus. Journal: International Computer Archive of Modern and Medieval English Journal, 29, 5-24.
  74. Silva, A. D. B., Gomes, M. M., da Costa, C. A., da Rosa Righi, R., Barbosa, J. L. V., Pessin, G., Doncker, G. D. & Federizzi, G. (2020). Intelligent personal assistants: A systematic literature review. Expert Systems with Applications, 147, 113193. https://doi.org/10.1016/j.eswa.2020.113193 Go to original source...
  75. Tan, Y., Min, D., Li, Y., Li, W., Hu, N., Chen, Y., & Qi, G. (2023). Can ChatGPT replace traditional KBQA models? An in-depth analysis of the question answering performance of the GPT LLM family. In International Semantic Web Conference (pp. 348-367). Springer Nature. https://doi.org/10.1007/978-3-031-47240-4_19 Go to original source...
  76. Taye, M. M. (2010). Understanding semantic web and ontologies: Theory and applications. arXiv preprint arXiv:1006.4567. https://doi.org/10.48550/arXiv.1006.4567 Go to original source...
  77. Tebbe, J-P. (2024). Enhancing Historical Research with RAG Chatbots. GitHub. https://github.com/Thukyd/azure-search-openai-hackathon
  78. Text embedding models. (2024). Text embedding models. LangChain. https://python.langchain.com/v0.1/docs/modules/data_connection/text_embedding/
  79. Thapliyal, H. (2023). Unveiling the Past: AI-Powered Historical Book Question Answering. Doctoral dissertation. Swiss School of Business and Management Geneva.
  80. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971. https://doi.org/10.48550/arXiv.2302.13971 Go to original source...
  81. Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460. https://doi.org/10.1093/mind/LIX.236.433 Go to original source...
  82. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, £. & Polosukhin, I. (2017). Attention is all you need. In 31st Conference on Neural Information Processing Systems, (NIPS 2017). NISP.
  83. Wallace, R. S. (2009). The anatomy of A.L.I.C.E. In Epstein, R., Roberts, G., Beber, G. (eds) Parsing the Turing Test, (pp. 181-210). Springer. https://doi.org/10.1007/978-1-4020-6710-5_13 Go to original source...
  84. Wang, P., Li, L., Chen, L., Cai, Z., Zhu, D., Lin, B., Cao, Y., Liu, Q., Liu, T., & Sui, Z. (2024). Large language models are not fair evaluators. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, (pp. 9440-9450). ACL. https://doi.org/10.18653/v1/2024.acl-long.511 Go to original source...
  85. Weizenbaum, J. (1966). ELIZA-a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36-45. https://doi.org/10.1145/365153.365168 Go to original source...
  86. White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382. https://doi.org/10.48550/arXiv.2302.11382 Go to original source...
  87. Yusuf, Y. L. (2024). A Question and Answering System for the Sarawak Gazette Using Chatbot Technology. GitHub. https://github.com/TelluricSpeck17101/SarawakGazetteQAS
  88. Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J. Y., & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223. https://doi.org/10.48550/arXiv.2303.18223 Go to original source...
  89. Zhai, C. (2008). Statistical language models for information retrieval a critical review. Foundations and Trends in Information Retrieval, 2(3), 137-213. https://doi.org/10.1561/1500000008 Go to original source...
  90. Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-judge with MT-bench and Chatbot Arena. In NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems, (pp. 46595-46623). NeurIPS.
  91. Zolkepli, H., & Nooh, K. (2024). MaLLaM. Mesolitica. https://mesolitica.com/mallam

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.