ADAPTATION OF BIG DATA TO LOCAL INFORMATION LANGUAGE MODELS: DEVELOPMENT OF THE BIGTOR CHATBOT SYSTEM
Keywords:
Large Language Models, Fine-tuning, Synthetic Data, Specialized Chatbots, Cultural PreservationAbstract
This paper presents the development of BigTor, a domain-specific chatbot designed to address cultural, administrative, and social information gaps in Azerbaijan. To overcome the limitations of general-purpose models in low-resource languages, the DeepSeek-R1-Distill-Llama-8B model was selected as the base architecture. The system was fine-tuned using a high-quality synthetic dataset and Parameter-Efficient Fine-Tuning methodologies. The training process employed LoRA adaptation, 4-bit quantization, and bfloat16 precision to ensure computational efficiency. Experimental results demonstrate that BigTorV1 achieved 92 percent accuracy in the national music domain, significantly outperforming the baseline model.
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.