Identification of Dominant Topics in Public Discussions on IKN using Latent Dirichlet Allocation (LDA) and BERTopic

Ariska Fitriyana Ningrum; Florence Jean B. Talirongan; Diana May Glaiza G. Tangaro

doi:10.64539/sjcs.v1i1.2025.19

Authors

Ariska Fitriyana Ningrum Universitas Muhammadiyah Semarang, Indonesia
Florence Jean B. Talirongan College of Computer Studies, Misamis University, Philippines
Diana May Glaiza G. Tangaro TK Elevator WLL, Fereej Bin Mahmoud, Qatar

DOI:

https://doi.org/10.64539/sjcs.v1i1.2025.19

Keywords:

Topic Modeling, LDA, BERTopic, Capital Relocation, Sentiment Analysis

Abstract

This study aims to analyze public opinion related to the relocation of Indonesia's National Capital City (IKN) through topic modeling on Twitter data. The two main approaches used are Latent Dirichlet Allocation (LDA) based on Bag of Words and BERTopic based on Transformer language model. LDA was chosen for its ability to identify topic distribution in large text collections, while BERTopic was used to overcome the limitations of LDA in capturing semantic meaning in short and informal texts such as tweets. The analysis was conducted on a collection of tweets discussing the relocation of IKN, with the aim of uncovering the main themes and public perceptions. The result of LDA showed three main topics in the public discussion, namely (1) political debate and nationalism related to the relocation, (2) policy implementation and project execution, and (3) economic justification and challenges facing Jakarta. Mean-while, BERTopic identified topics with more contextual representations, including aspects of investment, economic impact construction progress, and public perception. Dominant topics include urban relocation, investment in IKN, and socio-economic impacts. The novelty of study lies in the comparison of two topic modeling approaches in the context of social media sentiment analysis related to major public policy issues. These findings not only enrich the understanding of the narratives that develop in society, but also provide important insights for policy makers in responding to public opinion more appropriately and contextually.

References

[1] D. M. Blei, A. Y. Ng, and J. B. Edu, “Latent Dirichlet Allocation Michael I. Jordan,” 2003.

[2] T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proc Natl Acad Sci U S A, vol. 101, no. SUPPL. 1, pp. 5228–5235, Apr. 2004, doi: 10.1073/pnas.0307752101.

[3] H. Jelodar et al., “Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey,” 2017, doi: 10.48550/arXiv.1711.04305.

[4] R. Alghamdi and K. Alfalqi, “A Survey of Topic Modeling in Text Mining Topic Over Time (TOT), Dynamic Topic Models (DTM), Multiscale Topic Tomography, Dynamic Topic Correlation Detection, Detecting Topic Evolution in scientific literatures, etc. Keywords-Topic Modeling; Methods of Topic Modeling; Latent Semantic Analysis (LSA); Probabilistic Latent Semantic Analysis (PLSA); Latent Dirichlet Allocation (LDA); Correlated Topic Model (CTM); Topic Evolution Model,” 2015. [Online]. Available: www.ijacsa.thesai.org

[5] M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.05794

[6] L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clustering,” The Journal of Open Source Software, vol. 2, no. 11, p. 205, Mar. 2017, doi: 10.21105/joss.00205.

[7] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” [Online]. Available: https://github.com/UKPLab/

[8] J. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” [Online]. Available: https://github.com/tensorflow/tensor2tensor

[9] M. A. Mersha, M. Gemeda Yigezu, and J. Kalita, “Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 121–132. doi: 10.1016/j.procs.2024.10.185.

[10] C. Y. Sy, L. L. Maceda, N. M. Flores, and M. B. Abisado, “International Journal of INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING Unsupervised Machine Learning Approaches in NLP: A Comparative Study of Topic Modeling with BERTopic and LDA.” [Online]. Available: www.ijisae.org

[11] F. Bianchi, S. Terragni, and D. Hovy, “Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence.” [Online]. Available: https://github.com/UKPLab/

[12] A. Gillioz, J. Casas, E. Mugellini, and O. A. Khaled, “Overview of the Transformer-based Models for NLP Tasks,” in Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, FedCSIS 2020, Institute of Electrical and Electronics Engineers Inc., Sep. 2020, pp. 179–183. doi: 10.15439/2020F20.

[13] V. C. Storey and D. E. O’Leary, “Text Analysis of Evolving Emotions and Sentiments in COVID-19 Twitter Communication,” Cognit Comput, vol. 16, no. 4, pp. 1834–1857, Jul. 2024, doi: 10.1007/s12559-022-10025-3.

[14] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, Dec. 2014, doi: 10.1016/j.asej.2014.04.011.

[15] Y. Ardian Pradana, I. Cholissodin, and D. Kurnianingtyas, “Analisis Sentimen Pemindahan Ibu Kota Indonesia pada Media Sosial Twitter menggunakan Metode LSTM dan Word2Vec,” 2023. [Online]. Available: http://j-ptiik.ub.ac.id

[16] Y. Liu, “Comparison of LDA and BERTopic in News Topic Modeling: A Case Study of The New York Times’ Reports on China,” Article in Pacific International Journal, 2024, doi: 10.55014/pij.v7i3.616.

[17] E. Cambria, Y. Li, F. Z. Xing, S. Poria, and K. Kwok, “SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for Sentiment Analysis,” in International Conference on Information and Knowledge Management, Proceedings, Association for Computing Machinery, Oct. 2020, pp. 105–114. doi: 10.1145/3340531.3412003.

[18] M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.05794

[19] K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. Buttler, “Exploring Topic Coherence over many models and many topics,” Association for Computational Linguistics, 2012. [Online]. Available: http://mallet.cs.umass.edu/

[20] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Hierarchical Dirichlet processes,” J Am Stat Assoc, vol. 101, no. 476, pp. 1566–1581, Dec. 2006, doi: 10.1198/016214506000000302.

[21] Y. Miao, E. Grefenstette, and P. Blunsom, “Discovering Discrete Latent Topics with Neural Variational Inference,” 2017.

[22] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans Pattern Anal Mach Intell, vol. 35, no. 8, pp. 1798–1828, 2013, doi: 10.1109/TPAMI.2013.50.

[23] B. Ogunleye, T. Maswera, L. Hirsch, J. Gaudoin, and T. Brunsdon, “Comparison of Topic Modelling Approaches in the Banking Context,” Applied Sciences (Switzerland), vol. 13, no. 2, Jan. 2023, doi: 10.3390/app13020797.

[24] R. Egger and J. Yu, “A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts,” Frontiers in Sociology, vol. 7, May 2022, doi: 10.3389/fsoc.2022.886498.

[25] Maryanto, Philips, and A. S. Girsang, “Hybrid model for extractive single document summarization: utilizing BERTopic and BERT model,” IAES International Journal of Artificial Intelligence, vol. 13, no. 2, pp. 1723–1731, Jun. 2024, doi: 10.11591/ijai.v13.i2.pp1723-1731.

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Similar Articles

Most read articles by the same author(s)

Cover of the Journal

Quick Access

Editorial Policies

Information

Template

Flag Counter

Indexing and Abstract

Tools