NATURAL LANGUAGE PROCESSING APPROACH TO DETECTING SPONSORSHIP OF VIDEO CONTENT
Keywords:
Artificial Intelligence, AI, NLP, Natural Language Processing, ML, Machine learning, data science, data mining, content detectionAbstract
This study examined the possibility of using NLP software for automatic processing of such data to analyze the intention, perception and reaction to video messages to society, the end consumer of messages. The main goal is to extend the NLP approach in identifying video content sponsorship details of YouTube platform video-bank. The theoretical and practical significance of the methods proposed in the study designs the use of several metrics and models, where the example describes the practice of using and training a machine to recognize and read the content of a sponsored video material block. The studied example solves many relevant problems connecting to sponsorship information. The novelty of this research designs the unique way of improving existing basic NLP models that are designed to solve basic problems, but require separate specific approaches to NLP solutions. In particular, as a result obtained in this study, it was trained the basic model using the XGBoost algorithm and a customized pretrained BERT model: the Boosting-based model was taught with a vector representation of words by Term Frequency-Inverse Document Frequency Metric; further trained TF-IDF were labeled sponsored sentences, this approach was found to be intuitive in understanding common words present in the target sentences of the study; also, the BERT model was trained under-sampled unsponsored data using a pre-trained tokenizer for BERT and training the taught model for several epochs. The resulting data was shown to be consistent in having a meaningful resulting probability of over 70%
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.