This research aims to develop an intelligent Sinhala question-and-answer system to enhance technological resources for Sinhala language processing. This AI-driven system will provide an interactive platform for college students to engage in conversational, object-based learning in Sinhala. Voice input will facilitate questions, and responses will be delivered in Sinhala audio, enhancing immersion and usability. This manages a local Mongo database for knowledge storage and retrieval of the data system that wants access to relevant information. This interactive Sinhala Q&A system used BERT model achitecute and it is trained on Sinhala dataset. Overall system has capability to adapt according to objects available in the vision field and then generate responses based on the object specific context. Objectidentification is done by YOLO model. In addition to developing a Sinhala BERT NLP model, Sinhala NLP Wordpiece tokenizer and developing Sinhala Dataset suitable for Sinhala BERT modelare sub-contributions. Following result of this research can show that this system is an effective tool. When showing Banana tothe system and asking කෙසෙල් කියන්නේ මොනවද? (What arebananas?), system voice output will be පළතුරක් (a fruit). Inconclusion, this research will contribute to the field of educationaltechnology and by developing this system, college students will have access to a tool that can assist them in improving their knowledge and proficiency in the Sinhala language.
In recent years, the advancement of Artificial Intelligence and Natural Language Processing (NLP) technologies have led to significant developments in the field of question-and-answer systems. Question-and-answering systems are computer programs designed to interact with users in a conversational manner. They have gained popularity in various domains, including education, because they provide a convenient and interactive way of providing information and support. However, there is a lack of intelligent question-and-answer systems specifically tailored to the needs of Sinhala-speaking college students.
Sinhala is the official language of Sri Lanka and is spoken by a significant portion of the population. Considering the history of the Sinhala language and the cultural context of Sri Lanka , it is crucial to develop an intelligent Sinhala question and answer system by incorporating visual cues that can effectively address the needs of college students in their educational journey.
The absence of an intelligent Sinhala question and answer system by incorporating visual cues for college students presents a significant limitation in the education landscape. Existing chatbot systems primarily focus on English or other widely spoken languages, neglecting the specific linguistic and cultural needs of Sinhala-speaking students. This limitation hampers effective communication, information retrieval, and academic support for these students, hindering their learning experience and performance.
This research aims to present an approach to improve the knowledge of students, assist to students in general matters and interact in Sinhala Language based on objects currently working on. The following objectives were set for this research:
Firstly, the research endeavors to construct an intelligent Sinhala question and answer system tailored for college students, equipped with the ability to adapt dynamically to varying contexts. This involves several sub-goals: developing a robust Natural Language Processing (NLP) model specifically designed for Sinhala Q&A, creating an efficient NLP tokenizer tailored for Q&A applications, and assembling a comprehensive Sinhala dataset specifically curated for Q&A tasks.
Secondly, the study aims to design an intelligent system capable of discerning context based on objects within its visual field. This system will subsequently be integrated with the aforementioned question and answer system, enabling a holistic approach to contextual understanding and response generation.
Overall, this study aims to contribute to the field of educational technology by leveraging advancements in Natural Language Processing, computer vision, voice recognition, and artificial intelligence to develop an effective and efficient Sinhala question-and-answering system that can cater to the specific needs of college students.
These videos demonstrates the system's interaction with a user:
In conclusion, the Sinhala question and answer system effectively delivered accurate and relevant responses in Sinhala, marking a significant advancement in Sinhala language processing. The system successfully assisted college students with their queries and information needs.
Considering of objectives of this study, the following conclusions can be mentioned:
Development of interactive Sinhala question and answer system program for college student with capability to adapt according to the context of the object is successfully achieved. This system was able to understand the intent of the user's queries and provide appropriate responses, demonstrating its effectiveness in assisting college students with their queries and providing relevant information. The BERT architecture proves to be more suitable compared to contemporary architectures such as GPT and T5. This is because BERT utilizes only the encoder component of the Transformers architecture. It was introduced in 2018, and after conducting a thorough literature review, it was found that no research has been conducted using these modern BERT architectures for Sinhala language processing. By employing this method, highly accurate outputs can be obtained along with a deeper understanding of the Sinhala language model when compared to traditional methods. Furthermore, there is currently no developed Sinhala WordPiece tokenizer based on available literature. Therefore, any solutions utilizing Sinhala BERT are recommended due to their ability to handle larger vocabulary sizes. The resulting tokenizer encompasses an extensive vocabulary of 1,000,000 words. Although it predominantly focuses on Sinhala words, there is also some inclusion of English and Tamil words.
Development of intelligent system for identifying the context based on objects available in the vision field. Then combine with question and answer system program. This objective also has been successfully achieved. In specific scenarios, the utilization of Sinhala knowledge can be highly beneficial by maintaining a localized database. This advantage stems from the question and answer system's ability to offer users information that is more precise and pertinent to their individual requirements and circumstances. Furthermore, this approach presents advantages in terms of production as it has lower hardware demands, making it easily implementable even within resource-limited settings. The employed system relies on a NoSQL database, which facilitates faster processing when compared to SQL databases. By combining both visual and voice data processing techniques, this system excels at providing accurate answers tailored specifically for each query owing to its thorough understanding of related questions.
@INPROCEEDINGS{10841868,
author={Wansekara, Indramal and Jayasekara, A.G.B.P.},
booktitle={2024 4th International Conference on Electrical Engineering (EECon)},
title={Intelligent Sinhala Question and Answer System by Incorporating Visual Clues},
year={2024},
volume={},
number={},
pages={95-100},
keywords={Question Answering;Sinhala Language;Sinhala Dataset;Natural Language Processing (NLP)},
doi={10.1109/EECon64470.2024.10841868}}