ANALISIS METODE DEEP LEARNING DAN WORD EMBEDDING UNTUK DETEKSI BAHASA KASAR DI TWITTER

Rachma Fatmawati; Rifiana Arief

doi:10.54783/jser.v7i2.1067

Rachma Fatmawati Universitas Gunadarma
Rifiana Arief Universitas Gunadarma

DOI: https://doi.org/10.54783/jser.v7i2.1067

Keywords: Abusive, Deep Learning, Word2Vec, FastText

Abstract

Bahasa kasar adalah ungkapan yang mengandung frasa atau kata-kata kasar dan dikomunikasikan secara lisan atau tertulis kepada lawan bicara (individu atau kelompok), yang berdampak mempercepat terjadinya konflik sosial jika disertai dengan ujaran kebencian. Twitter, sebagai Platform media sosial yang banyak digunakan, sering menjadi wadah penyebaran bahasa kasar dalam berbagai bentuk, seperti sarkasme dan penghinaan. Oleh karena itu, diperlukan sistem deteksi otomatis untuk menyaring konten negatif guna menjaga kualitas interaksi di dunia maya. Penelitian ini mengembangkan model deteksi bahasa kasar di Twitter menggunakan BiLSTM dengan Word Embedding FastText. Tahapan penelitian meliputi pengumpulan data, Pelabelan, Preprocessing, klasifikasi, evaluasi, serta prediksi dan visualisasi deteksi bahasa. Model BiLSTM dibandingkan dengan CNN dan LSTM, serta dilakukan perbandingan performa antara Word2Vec dan FastText sebagai metode Word Embedding. Hasil penelitian menunjukkan bahwa BiLSTM dengan FastText memiliki akurasi tertinggi, yaitu 87%, dengan F1-Score sebesar 85%, mengungguli CNN (79%) dan LSTM (82%). Selain itu, FastText terbukti lebih efektif dibandingkan Word2Vec, dengan hasil akurasi yang lebih tinggi dan kemampuan menangani kata-kata baru atau slang lebih baik. BiLSTM mampu menangkap konteks bahasa kasar secara lebih akurat, terutama dalam memahami struktur kalimat yang kompleks. Namun, masih terdapat tantangan dalam mengklasifikasikan kata-kata yang ambigu atau kontekstual.

References

Buda, M., Maki, A., & Foti, N. (2020). A Survey on the Effect of Imbalanced Data on Deep Learning. Journal of Data Science and Technology, 34(1), 22-35.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2021). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16(1), 321-357.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2019). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of EMNLP, 1724-1734.
Goodfellow, I., Bengio, Y., & Courville, A. (2020). Deep Learning. MIT Press.
Han, J., Kamber, M., & Pei, J. (2020). Data Mining: Concepts and Techniques (4th ed.). Morgan Kaufmann.
He, H., & Garcia, E. A. (2020). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 20(3), 100-116.
Huang, Y., & Liu, Y. (2021). Support Vector Machines for Text Classification: A Review. Journal of Machine Learning, 10(2), 23-40.
Huang, Y., & Liu, Y. (2021). Support Vector Machines for Text Classification: A Review. Journal of Machine Learning, 10(2), 23-40.
Ihsan, F., Iskandar, I., Harahap, N. S., & Agustian, S. (2021). Algoritme Decision Tree untuk Mendeteksi Ujaran Kebencian dan Bahasa Kasar Multilabel pada Twitter Berbahasa Indonesia. Jurnal Teknologi dan Sistem Komputer, 9(4), 199–204.
Joulin, A., Grave, E., Mikolov, T., & Mikolov, P. (2020). Bag of Tricks for Efficient Text Classification. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 49-58.
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (2020). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Li, X., & Li, S. (2022). Enhancing Text Classification with TF-IDF and Word Embeddings. International Journal of Computer Science, 28(3), 45-57.
Li, X., & Li, S. (2022). Enhancing Text Classification with TF-IDF and Word Embeddings. International Journal of Computer Science, 28(3), 45-57.
Liu, S., Wang, X., & Zhang, X. (2021). Class Imbalance and Learning Approaches in Machine Learning. Journal of Machine Learning Research, 15(4), 103-120.
Liu, X., Liu, L., & Wang, Y. (2021). A Survey of Text Mining Techniques and Applications. International Journal of Computational Intelligence, 17(2), 156-170.
Pangestuti, I., & Agustian, S. (2022). Klasifikasi Komentar Abusive dan Hate Speech Teks Twitter Menggunakan Metode Convolutional Neural Network. TEKNOKA, 7.
Subramanian, S., Gunasekar, S., & Aggarwal, C. C. (2021). Learning to Represent Words with FastText. Journal of Machine Learning Research, 22(6), 101-115.
Tjahyanti, L. P. A. S. (2020). Pendeteksian Bahasa Kasar (Abusive Language) dan Ujaran Kebencian (Hate Speech) dari Komentar di Jejaring Sosial. Daiwi Widya, 7, 47–60.
Vaswani, A. (2019). Attention is All You Need. Proceedings of NeurIPS, 32, 6405-6415.
Vaswani, A. (2019). Attention is All You Need. Proceedings of NeurIPS, 32, 6405-6415.
Vaswani, A. (2019). Attention is All You Need. Proceedings of NeurIPS, 32, 6405-6415.
Yuan, X., Zhao, P., & Li, Z. (2021). Handling Class Imbalance in Deep Learning for Classification Tasks. Neural Networks, 132, 1-13.
Zhang, L., & Liu, J. (2023). Text Mining Approaches for Document Classification. Journal of AI Research, 40(1), 34-50.
Zhang, L., & Liu, Y. (2023). Ensemble Methods for Handling Imbalanced Data in Classification. Machine Learning Journal, 40(2), 234-248.
Zhang, Y., Zhao, L., & Li, F. (2021). FastText-Based Text Classification: A Comprehensive Survey. Journal of Computational Linguistics, 47(2), 119-132.
Zhang, Z., & Liu, J. (2023). Optimizing Deep Learning for Text Classification. Journal of AI Research, 15(1), 50-67.
Zhang, Z., & Liu, J. (2023). Optimizing Deep Learning for Text Classification. Journal of AI Research, 15(1), 50-67.
Zhou, X., Zhang, Y., & Lee, H. (2020). A Comparative Study on Naive Bayes Classifiers for Text Mining Applications. International Journal of Data Science, 18(4), 101-115.
Zhou, X., Zhang, Y., & Lee, H. (2020). Deep Learning for Text Mining: A Survey. Journal of Machine Learning, 12(3), 88-102.

ANALISIS METODE DEEP LEARNING DAN WORD EMBEDDING UNTUK DETEKSI BAHASA KASAR DI TWITTER

Abstract

References

Ikatan Dosen Menulis

Managed in GoAcademica Affiliate