SEMEVAL 2017 TUGAS 4: ANALISIS SENTIMEN DI TWITTER

  • Brian Arnesto Sitorus Universitas Bakrie
  • Zakiul Fahmi Jailani Universitas Bakrie
  • Dita Nurmadewi Universitas Bakrie
Keywords: Sentimen, Twitter

Abstract

Dataset SemEval yang digunakan dalam penelitian ini mencakup 11 set dataset tweet dari platform twitter yang dikumpulkan antara tahun 2013 hingga 2016. Dataset yang didapatkan masih memerlukan beberapa proses preprocessing agar kesalahan dalam dataset tersebut dapat teratasi seperti adanya tweet yang dipisahkan mengggunakan tab dan koma, sehingga dalam satu dataset dapat memuat beberapa tweet yang saling bertumpuk. Dikarenakan ada 2 dataset yang memiliki terlalu banyak kesalahan performattan, hanya 9 set dataset yang digunakan dalam penelitian ini. Pada proses praprocessing, kesalahan dalam dataset dianalisis menggunakan library Spacy, selanjutnya tanda @mention yang merujuk kepada username yang dimention dalam tweet tersebut dihapus, lemmatisasi dilakukan dengan menggunakan Spacy, serta karakter yang tidak sesuai dengan ejaan standar dihapuskan. Terdapat tiga kelas diadalam dataset tersebut yaitu neutral, positive dan negative, namun antara ketiga kelas ini memiliki proporsi jumlah yang tidak seimbang. Ketika proporsi dataset tidak seimbang, pada proses training akan menghasilkan model machine learning yang bias pada kelas set yang paling mayoritas. Untuk mengatasi kendala bias ini , maka teknik oversampling dan undersampling diterapkan. Ketika mengimplementasikan kedua teknik ini, metode SMOTE dari teknik oversampling memiliki performa yang terbaik dibandingkan metode lainnya. Selanjutnya, beragam classifier telah diuji bersamaan dengan SMOTE, dan Logistic Regression menunjukkan performa yang paling superior.

References

[1] Ajith Abraham, Jaime Lloret Mauri, John Buford, Junichi Suzuki, and Sabu M Thampi. 2011. Advances in Computing and Communications, Part I: First International Conference, ACC 2011, Kochi, India, July 22-24, 2011. Proceedings. Vol. 190. Springer Science & Business Media.
[2] Charu C Aggarwal et al. 2016. Recommender systems. Vol. 1. Springer.
[3] Syed Thouheed Ahmed, Syed Muzamil Basha, Sajeev Ram Arumugam, and Mallikarjun M Kodabagi. 2021. Pattern Recognition: An Introduction. MileStone Research Publications.
[4] Davide Anguita, Luca Ghelardoni, Alessandro Ghio, Luca Oneto, and Sandro Ridella. 2012. The ‘K’in K-fold cross validation. In 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). i6doc. com publ, 441–446.
[5] Francis Effirim Botchey, Zhen Qin, and Kwesi Hughes-Lartey. 2020. Mobile Money Fraud Prediction—A Cross-Case Analysis on the Efficiency of Support Vector Machines, Gradient Boosted Decision Trees, and Naïve Bayes Algorithms. Information 11, 8 (2020), 383.
[6] Danah Boyd, Scott Golder, and Gilad Lotan. 2010. Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In 2010 43rd Hawaii international conference on system sciences. IEEE, 1–10.
[7] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.
[8] Nitesh V Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. Special issue on learning from imbalanced data sets. ACM SIGKDD explorations newsletter 6, 1 (2004), 1–6.
[9] Gilles Cohen, Mélanie Hilario, Hugo Sax, Stéphane Hugonnet, and Antoine Geissbuhler. 2006. Learning from imbalanced data in surveillance of nosocomial infection. Artificial intelligence in medicine 37, 1 (2006), 7–18.
[10] Montserrat Comesaña, Ana Paula Soares, Manuel Perea, Ana P Piñeiro, Isabel Fraga, and Ana Pinheiro. 2013. ERP correlates of masked affective priming with emoticons. Computers in Human Behavior 29, 3 (2013), 588–595.
[11] Thomas Edgar and David Manz. 2017. Research methods for cyber security. Syngress.
[12] Rajesh Sinha Priyankar Sinha Adarsh Pradhan Eriq-Ur Rahman, Rituparna Sarma. 2018. A Survey on Twitter Sentiment Analysis. International Journal of Computer Sciences and Engineering 6 (11 2018), 644–648. Issue 11. https://doi.org/10.26438/ ijcse/v6i11.644648
[13] Haibo He and Yunqian Ma. 2013. Imbalanced learning: foundations, algorithms, and applications. (2013).
[14] Kamal Kant Hiran, Ritesh Kumar Jain, Kamlesh Lakhwani, and Ruchi Doshi. 2021. Machine Learning: Master Supervised and Unsupervised Learning Algorithms with Real Examples (English Edition). BPB Publications.
[15] Bernard J Jansen, Mimi Zhang, Kate Sobel, and Abdur Chowdury. 2009. Twitter power: Tweets as electronic word of mouth. Journal of the American society for information science and technology 60, 11 (2009), 2169–2188.
[16] Annie Kim. 2021. Optimal Selection of Resampling Methods for Imbalanced Data with High Complexity. Ph. D. Dissertation. Department of Biostatistics and Computing and the Graduate School of Yonsei University.
[17] Guillaume Lemaître, Fernando Nogueira, and Christos K Aridas. 2017. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. The Journal of Machine Learning Research 18, 1 (2017), 559–563.
[18] Yasuhide Miura, Shigeyuki Sakaki, Keigo Hattori, and Tomoko Ohkuma. 2014. TeamX: A sentiment analyzer with enhanced lexicon mapping and weighting scheme for unbalanced data. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 628–632.
[19] Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. 2016. Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016). 31–41.
[20] Saif M Mohammad. 2016. Sentiment analysis: Detecting valence, emotions, and other affectual states from text. In Emotion measurement. Elsevier, 201–237.
[21] Maria Carolina Monard and GEAPA Batista. 2002. Learning with skewed class distributions. Advances in Logic, Artificial Intelligence and Robotics 85 (2002), 173–180.
[22] Aminu Muhammad, Nirmalie Wiratunga, and Robert Lothian. 2014. A hybrid sentiment lexicon for social media mining. In 2014 IEEE 26th International Conference on Tools with Artificial Intelligence. IEEE, 461–468.
[23] Iman Nekooeimehr and Susana K Lai-Yuen. 2016. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Systems with Applications 46 (2016), 405–416.
[24] Elisavet Palogiannidi, Athanasia Kolovou, Fenia Christopoulou, Filippos Kokkinos, Elias Iosif, Nikolaos Malandrakis, Harris Papageorgiou, Shrikanth Narayanan, and Alexandros Potamianos. 2016. Tweester at SemEval-2016 Task 4: Sentiment analysis in Twitter using semantic-affective model adaptation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 155–163.
[25] Georgios Paltoglou, Stéphane Gobron, Marcin Skowron, Mike Thelwall, and Daniel Thalmann. 2010. Sentiment analysis of informal textual communication in cyberspace. In Proc. Engage 2010, Springer LNCS State-of-the-Art Survey (2010), 13–25.
[26] Georgios Paltoglou and Mike Thelwall. 2012. Twitter, MySpace, Digg: Unsupervised sentiment analysis in social media. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4 (2012), 1–19.
[27] Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Found.Trends Inf. Retr. 2, 1–2 (jan 2008), 1–135. https://doi.org/10.1561/1500000011
[28] Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad Al-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, et al. 2016. Semeval-2016 task 5: Aspect based sentiment analysis. In International workshop on semantic evaluation. 19–30.
[29] Maria Pontiki, Dimitrios Galanis, Harris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos. 2015. Semeval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). 486–495.
[30] Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Association for Computational Linguistics, Dublin, Ireland, 27–35. https://doi.org/10.3115/v1/S14-2004
[31] Alireza Rahnama, Sam Clark, and Seetharaman Sridhar. 2018. Machine learning for predicting occurrence of interphase precipitation in HSLA steels. Computational Materials Science 154 (2018), 169–177.
[32] Farshid Rayhan, Sajid Ahmed, Asif Mahbub, Rafsan Jani, Swakkhar Shatabda, and Dewan Md Farid. 2017. Cusboost: Cluster-based under-sampling with boosting for imbalanced classification. In 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS). IEEE, 1–5.
[33] Sara Rosenthal, Noura Farra, and Preslav Nakov. 2017. SemEval-2017 task 4: Sentiment analysis in Twitter. In Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017). 502–518.
[34] Sara Rosenthal and Kathleen McKeown. 2016. Social proof: The impact of author traits on influence detection. In Proceedings of the First Workshop on NLP and Computational Social Science. 27–36.
[35] Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko, Saif M Mohammad, Alan Ritter, and Veselin Stoyanov. 2015. SemEval-2015 Task 10: Sentiment Analysis in Twitter. 451-463. In Proc. of the 9th International Workshop on Semantic Evaluation.
[36] Andrew P Sage, RM HARALICK, HE KOENIG, MH MICKLE, CP NEUMAN, HL OESTREICHER, and F DICESARE. 1979. IEEE Transactions on Systems, Man, and Cybernetics. (1979).
[37] Carlo Strapparava and Rada Mihalcea. 2007. Semeval-2007 task 14: Affective text. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007). 70–74.
[38] Meenakshi Tripathi and Sushant Upadhyaya. 2021. Conference Proceedings of ICDLAIR2019. Vol. 175. Springer Nature.
[39] Andranik Tumasjan, Timm Sprenger, Philipp Sandner, and Isabell Welpe. 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 4.
[40] Hao Wang, Doğan Can, Abe Kazemzadeh, François Bar, and Shrikanth Narayanan. 2012. A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In Proceedings of the ACL 2012 system demonstrations. 115–120.
[41] Sabih Bin Wasi, Rukhsar Neyaz, Houda Bouamor, and Behrang Mohit. 2014. CMUQ@ Qatar: Using rich lexical features for sentiment analysis on Twitter. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 186–191.
[42] Zhihua Zhang, Guoshun Wu, and Man Lan. 2015. Ecnu: Multi-level sentiment analysis on twitter using traditional linguistic features and word embedding features. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). 561–567.
Published
2024-02-16
How to Cite
Brian Arnesto Sitorus, Zakiul Fahmi Jailani, & Dita Nurmadewi. (2024). SEMEVAL 2017 TUGAS 4: ANALISIS SENTIMEN DI TWITTER. Journal of Scientech Research and Development, 5(2), 1081-1096. https://doi.org/10.56670/jsrd.v5i2.299