Analysis of Suspected Factors in Tuberculosis Cases in Semarang City Using a Logistic Regression Model

Authors

  • Ihsan Fathoni Amri Department of Data Science, Universitas Muhammadiyah Semarang, Indonesia
  • Febrian Hikmah Nur Rohim Department of Data Science, Universitas Muhammadiyah Semarang, Indonesia
  • Muhammad Ivan Ardiansyah Department of Data Science, Universitas Muhammadiyah Semarang, Indonesia
  • Farid Sam Saputra Department of Data Science, Universitas Muhammadiyah Semarang, Indonesia
  • Supriyanto Department of Chemistry Education, Universitas Muhammadiyah Semarang, Indonesia
  • Ariska Fitriyana Ningrum Department of Data Science, Universitas Muhammadiyah Semarang, Indonesia
  • Arman Mohammad Nakib Artificial Intelligence, Nanjing University of Information Science &Technology, China

DOI:

https://doi.org/10.64539/sjcs.v1i1.2025.32

Keywords:

Tuberculosis, Logistic Regression, Risk Factors, Classification

Abstract

Tuberculosis (TB) is one of the world's deadliest infectious diseases, with Indonesia being among the countries with the highest TB burden. Semarang City, as an urban area with a dense population, faces significant challenges in controlling TB, particularly among vulnerable populations. This study identifies significant risk factors influencing TB incidence in Semarang City using a binary logistic regression model. Descriptive analysis reveals an imbalance in the data, with the majority of patients categorized as "not indicated for TB." Chi-Square tests show that variables such as shortness of breath, persistent fever for more than one month, diabetes mellitus, and household contact are significantly associated with TB incidence. The logistic regression model demonstrates overall significance (G statistic = 275.13; p-value = 1.23×10−55), with shortness of breath and diabetes mellitus emerging as major risk factors based on odds ratio interpretation. However, the model's performance in detecting the "indicated for TB" category is very low (Precision 36.36%; Recall 2.05%; F1-Score 3.88%), despite an overall accuracy of 87.25%. The poor performance in the "1" category and the Pseudo R2 value of 7% are likely related to data imbalance, where the number of cases in the "1" category is much smaller than in the "0" category, leading to bias toward the majority class. Additionally, the distribution of predictor variables that do not provide sufficient information to distinguish the "1" category from the "0" category further contributes to the model's limited ability to explain data variability overall.

References

[1] A. Matteelli, S. Lovatti, B. Rossi, and L. Rossi, “Update on multidrug-resistant tuberculosis preventive therapy toward the global tuberculosis elimination,” International Journal of Infectious Diseases, vol. 155, p. 107849, Jun. 2025, doi: 10.1016/j.ijid.2025.107849.

[2] L. R. Idrus, N. Fitria, F. D. Purba, J.-W. C. Alffenaar, and M. J. Postma, “Analysis of Health-Related Quality of Life and Incurred Costs Among Human Immunodeficiency Virus, Tuberculosis, and Tuberculosis/HIV Coinfected Outpatients in Indonesia,” Value Health Reg Issues, vol. 41, pp. 32–40, May 2024, doi: 10.1016/j.vhri.2023.10.010.

[3] S. M. Patil, A. M. Diorio, P. Kommarajula, and N. K. Kunda, “A quality-by-design strategic approach for the development of bedaquiline-pretomanid nanoparticles as inhalable dry powders for TB treatment,” Int J Pharm, vol. 653, p. 123920, Mar. 2024, doi: 10.1016/j.ijpharm.2024.123920.

[4] S. Mandal, P. Biswas, W. Ansar, P. Mukherjee, and J. J. Jawed, “Tuberculosis of the central nervous system: Pathogenicity and molecular mechanism,” in A Review on Diverse Neurological Disorders, Elsevier, 2024, pp. 93–102. doi: 10.1016/B978-0-323-95735-9.00050-4.

[5] A. D. Orjuela-Cañón, A. F. Romero-Gómez, A. L. Jutinico, C. E. Awad, E. Vergara, and M. A. Palencia, “Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings,” Applied Sciences, vol. 15, no. 10, p. 5423, May 2025, doi: 10.3390/app15105423.

[6] K. Bhattacharyya, R. P. Jha, D. Dhamnetiya, P. Patel, N. Shri, and M. Singh, “Exploring secular trends and types of tuberculosis burden in India over past three decades through insights from the Global Burden of Disease Study 2019,” Discover Public Health, vol. 22, no. 1, p. 439, Jul. 2025, doi: 10.1186/s12982-025-00772-7.

[7] Y. Penyami, M. P. Angkasa, and S. Sumarni, “Using storybooks to enhance health awareness among schoolchildren at risk of tuberculosis,” Malahayati International Journal of Nursing and Health Science, vol. 7, no. 11, pp. 1338–1343, Feb. 2025, doi: 10.33024/minh.v7i11.567.

[8] R. B. Fanda, A. Probandari, M. O. Kok, and R. A. Bal, “Managing medicines in decentralization: discrepancies between national policies and local practices in primary healthcare settings in Indonesia,” Health Policy Plan, vol. 40, no. 3, pp. 346–357, Mar. 2025, doi: 10.1093/heapol/czae114.

[9] I. Kumalasari, “Analysis of Risk Factors Associated with Pulmonary Tuberculosis Incidence in Islamic Boarding Schools,” BALABA, vol. 20, no. 2, pp. 85–95, 2024.

[10] V. Srivastava and A. Verma, “Current Challenges in the Management of Tuberculosis,” Journal of Young Pharmacists, vol. 16, no. 2, pp. 145–154, Jun. 2024, doi: 10.5530/jyp.2024.16.21.

[11] M. J. Nasiri, K. Lutfy, and V. Venketaraman, “Challenges of Multidrug-Resistant Tuberculosis Meningitis: Current Treatments and the Role of Glutathione as an Adjunct Therapy,” Vaccines (Basel), vol. 12, no. 12, p. 1397, Dec. 2024, doi: 10.3390/vaccines12121397.

[12] Md. Faiyazuddin et al., “The Impact of Artificial Intelligence on Healthcare: A Comprehensive Review of Advancements in Diagnostics, Treatment, and Operational Efficiency,” Health Sci Rep, vol. 8, no. 1, Jan. 2025, doi: 10.1002/hsr2.70312.

[13] Aliu Olalekan Olatunji, Janet Aderonke Olaboye, Chukwudi Cosmos Maha, Tolulope Olagoke Kolawole, and Samira Abdul, “Revolutionizing infectious disease management in low-resource settings: The impact of rapid diagnostic technologies and portable devices,” International Journal of Applied Research in Social Sciences, vol. 6, no. 7, pp. 1417–1432, Jul. 2024, doi: 10.51594/ijarss.v6i7.1332.

[14] S. Handayani and S. Isworo, “Evaluation of Tuberculosis program implementation in Primary Health Care, Semarang, Indonesia,” International Journal of Public Health Asia Pacific, pp. 1–11, Jun. 2024, doi: 10.62992/qb8eay62.

[15] V. R. Aditya, M. Raharjo, and O. Setiani, “Analysis of the Quality of the Physical Environment of the House on the Incidence of Tuberculosis in Tembalang Subdistrict,” Jurnal Penelitian Pendidikan IPA, vol. 11, no. 5, pp. 677–683, May 2025, doi: 10.29303/jppipa.v11i5.11393.

[16] A. Natalis, “Power, Law, and the Semiotics of Marginalisation: Rethinking Prostitution, Health Risk, and Legal Discourse in Indonesia,” Int J Semiot Law, Jul. 2025, doi: 10.1007/s11196-025-10310-y.

[17] S. Shafique et al., “Effective community-based interventions to prevent and control infectious diseases in urban informal settlements in low- and middle-income countries: a systematic review,” Syst Rev, vol. 13, no. 1, p. 253, Oct. 2024, doi: 10.1186/s13643-024-02651-9.

[18] S. N. Ogbonna, C. N. Ochie, and E. C. Aniwada, “Urban slum housing quality, and its public health implications in Nigeria: a case of urban slum residents in Enugu metropolis, South East, Nigeria,” BMC Public Health, vol. 24, no. 1, p. 3231, Nov. 2024, doi: 10.1186/s12889-024-20764-7.

[19] R. A. Rahmadani, A. A. Sainal, and S. Suprapto, “Community Empowerment to Increase Knowledge About Tuberculosis,” Abdimas Polsaka: Jurnal Pengabdian Masyarakat, vol. 2, no. 2, pp. 117–123, 2023.

[20] F. Fahdhienie, M. Mudatsir, T. F. Abidin, and N. Nurjannah, “Risk factors of pulmonary tuberculosis in Indonesia: A case-control study in a high disease prevalence region,” Narra J, vol. 4, no. 2, p. e943, Aug. 2024, doi: 10.52225/narra.v4i2.943.

[21] S. Das et al., “Prevalence, risk factors, and comorbidities of type 2 diabetes among COPD patients at a Bhubaneswar secondary care hospital,” Int J Diabetes Dev Ctries, Oct. 2024, doi: 10.1007/s13410-024-01404-7.

[22] M. Fayaz, S. A. Zakki, I. U. Haq, M. Afzal, M. Latif, and E. Altaf, “Evaluation of health-related quality of life among patients with chronic obstructive pulmonary disease at District Headquarter Hospital haripur, Pakistan,” Clin Epidemiol Glob Health, vol. 32, p. 101917, Mar. 2025, doi: 10.1016/j.cegh.2025.101917.

[23] M. S. Bah et al., “Assessment of comorbidities, risk factors, and post tuberculosis lung disease in National Tuberculosis Guidelines: A scoping review,” PLOS Global Public Health, vol. 5, no. 7, p. e0004935, Jul. 2025, doi: 10.1371/journal.pgph.0004935.

[24] J. Yayan, K.-J. Franke, M. Berger, W. Windisch, and K. Rasche, “Early detection of tuberculosis: a systematic review,” Pneumonia, vol. 16, no. 1, p. 11, Jul. 2024, doi: 10.1186/s41479-024-00133-z.

[25] M. Coleman, L. Martinez, G. Theron, R. Wood, and B. Marais, “Mycobacterium tuberculosis Transmission in High-Incidence Settings—New Paradigms and Insights,” Pathogens, vol. 11, no. 11, p. 1228, Oct. 2022, doi: 10.3390/pathogens11111228.

[26] R. Long, M. Divangahi, and K. Schwartzman, “Chapter 2: Transmission and pathogenesis of tuberculosis,” Canadian Journal of Respiratory, Critical Care, and Sleep Medicine, vol. 6, no. sup1, pp. 22–32, Mar. 2022, doi: 10.1080/24745332.2022.2035540.

[27] E. Garianto et al., “Rifampicin mono resistant tuberculosis (RR-TB): a case report,” Surabaya Medical Journal, pp. 48–57, May 2024, doi: 10.59747/smjidisurabaya.v2i1.38.

[28] N. Funaguchi et al., “Respiratory/Infection Symptoms,” in Internal Medicine for Dental Treatments, Singapore: Springer Nature Singapore, 2023, pp. 3–11. doi: 10.1007/978-981-99-3296-2_1.

[29] S. E. Barry, A. Sawka, A. Maldari, J. Inauen, S. LaBroome, and J. B. Geake, “Macrophage Dysfunction in Tuberculosis–Diabetes Mellitus Comorbidity: A Scoping Review of Immune Dysregulation and Disease Progression,” Diabetology, vol. 6, no. 5, p. 35, May 2025, doi: 10.3390/diabetology6050035.

[30] Z. Ye et al., “Impact of diabetes mellitus on tuberculosis prevention, diagnosis, and treatment from an immunologic perspective,” Exploration, vol. 4, no. 5, Oct. 2024, doi: 10.1002/EXP.20230138.

[31] Y. Hamada et al., “Tobacco smoking clusters in households affected by tuberculosis in an individual participant data meta-analysis of national tuberculosis prevalence surveys: Time for household-wide interventions?,” PLOS Global Public Health, vol. 4, no. 2, p. e0002596, Feb. 2024, doi: 10.1371/journal.pgph.0002596.

[32] C. Feldman, A. J. Theron, M. C. Cholo, and R. Anderson, “Cigarette Smoking as a Risk Factor for Tuberculosis in Adults: Epidemiology and Aspects of Disease Pathogenesis,” Pathogens, vol. 13, no. 2, p. 151, Feb. 2024, doi: 10.3390/pathogens13020151.

[33] M. Abbasian, H. Sadeghi‐bazargani, H. Matlabi, N. Havaei, M. Hashemiparast, and H. Allahverdipour, “Factors Affecting Home Injuries in Older Adults: An Analysis Using Binary Logistic Regression,” Health Sci Rep, vol. 8, no. 7, Jul. 2025, doi: 10.1002/hsr2.71055.

[34] E. O’Shaughnessy, E. Detrinidad, P. Soyer, and A. Lecler, “An introductory guide to statistics for the radiologist,” Diagn Interv Imaging, vol. 106, no. 2, pp. 49–52, Feb. 2025, doi: 10.1016/j.diii.2024.11.003.

[35] D. Dey et al., “The proper application of logistic regression model in complex survey data: a systematic review,” BMC Med Res Methodol, vol. 25, no. 1, p. 15, Jan. 2025, doi: 10.1186/s12874-024-02454-5.

[36] Y. Takefuji, “Limitations of logistic regression in analyzing complex ambulatory blood pressure data: a call for non-parametric approaches,” Eur Heart J, Jul. 2025, doi: 10.1093/eurheartj/ehaf541.

[37] Y. Hua, T. S. Stead, A. George, and L. Ganti, “Clinical Risk Prediction with Logistic Regression: Best Practices, Validation Techniques, and Applications in Medical Research,” Academic Medicine & Surgery, Mar. 2025, doi: 10.62186/001c.131964.

[38] Q.-Y. Chen, S.-M. Yin, M.-M. Shao, F.-S. Yi, and H.-Z. Shi, “Machine learning-based Diagnostic model for determining the etiology of pleural effusion using Age, ADA and LDH,” Respir Res, vol. 26, no. 1, p. 170, May 2025, doi: 10.1186/s12931-025-03253-2.

[39] L. M. Faye, C. Magwaza, N. Dlatu, and T. Apalata, “Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches,” Information, vol. 16, no. 3, p. 239, Mar. 2025, doi: 10.3390/info16030239.

[40] A. K. Tiwari and A. Katiyar, “Tuberculosis Disease Detection: Comparative Analysis of Logistic Regression and Decision Tree Models for Predicting TB Positivity Using Demographic and Symptom Data,” in Proceedings of Fourth International Conference on Computing and Communication Networks, 2025, pp. 359–373. doi: 10.1007/978-981-96-3250-3_29.

[41] S. Rydzi, B. Zahradnikova, Z. Sutova, M. Ravas, D. Hornacek, and P. Tanuska, “A Predictive Quality Inspection Framework for the Manufacturing Process in the Context of Industry 4.0,” Sensors, vol. 24, no. 17, p. 5644, Aug. 2024, doi: 10.3390/s24175644.

[42] S. Kruschel, N. Hambauer, S. Weinzierl, S. Zilker, M. Kraus, and P. Zschech, “Challenging the Performance-Interpretability Trade-Off: An Evaluation of Interpretable Machine Learning Models,” Business & Information Systems Engineering, Feb. 2025, doi: 10.1007/s12599-024-00922-2.

[43] Y. Cai, B. de Jonge, and R. H. Teunter, “Data-driven condition-based maintenance optimization given limited data,” Eur J Oper Res, vol. 324, no. 1, pp. 324–334, Jul. 2025, doi: 10.1016/j.ejor.2025.01.010.

[44] N. Zhang et al., “A data-driven methodology for fragility assessment of hang-off deepwater drilling risers under emergency evacuation conditions,” Ocean Engineering, vol. 315, p. 119777, Jan. 2025, doi: 10.1016/j.oceaneng.2024.119777.

[45] A. K. Sah et al., “Role of Artificial Intelligence and Personalized Medicine in Enhancing HIV Management and Treatment Outcomes,” Life, vol. 15, no. 5, p. 745, May 2025, doi: 10.3390/life15050745.

[46] N. Nuha, S. Ali Pitchay, A. H. Ab Halim, M. A. Bin Sahbudin, and I. Sahbudin, “Beyond the outbreak: a review of big data analytics in proactive infectious disease prevention for risk mitigation for COVID-19,” J Big Data, vol. 12, no. 1, p. 185, Jul. 2025, doi: 10.1186/s40537-025-01245-z.

[47] K. Getu and H. Gangadhara Bhat, “Application of geospatial techniques and binary logistic regression model for analyzing driving factors of urban growth in Bahir Dar city, Ethiopia,” Heliyon, vol. 10, no. 3, p. e25137, Feb. 2024, doi: 10.1016/j.heliyon.2024.e25137.

[48] J. R. Wilson, K. A. Lorenz, and L. P. Selby, “Standard Binary Logistic Regression Model,” in Modeling Binary Correlated Responses, 2024, pp. 27–59. doi: 10.1007/978-3-031-62427-8_3.

[49] D. Kartikasari, “Analisis Faktor-Faktor yang Mempengaruhi Level Polusi Udara dengan Metode Regresi Logistik Biner,” Mathunesa: Jurnal Ilmiah Matematika, vol. 8, no. 1, pp. 55–59, 2020.

[50] G. Sastro, A. Syafiih, and Ilmadi, “Binary Logistic Regression Model of Parental Interest in Islamic Boarding Schools with R Program: A Case Study Islamic Boarding Schools Tahfidz Daarul Qur’an Tangerang,” Ceddi Journal of Education, vol. 3, no. 1, pp. 8–15, Jun. 2024, doi: 10.56134/cje.v3i1.91.

[51] Ni Made Deviani Prisilia, Adelia Yuniarti, Citra Annisa Rahmania, Made Ayu Asri Oktarini Putri, and Made Susilawati, “Factors That Influence Diabetes Disease,” International Journal of Applied Mathematics and Computing, vol. 1, no. 3, pp. 31–40, Oct. 2024, doi: 10.62951/ijamc.v1i3.27.

[52] O. Haloho, P. Sembiring, and A. Manurung, “Penerapan Analisis Regresi Logistik Pada Pemakaian Alat Kontrasepsi Wanita (Studi Kasus di desa Dolok Mariah Kabupaten Simalungun),” 2013.

[53] M. P. Woller and C. K. Enders, “Exploration of the MCMC Wald test with linear regression,” Behav Res Methods, vol. 56, no. 7, pp. 7391–7409, Jun. 2024, doi: 10.3758/s13428-024-02426-z.

[54] F. Sarto, S. Saggese, E. Carbone, and P. Sarnacchiaro, “Integrating SEM, Wald test and ANOM to disentangle the effect of TMT functional background on strategic plans,” Socioecon Plann Sci, vol. 96, p. 102083, Dec. 2024, doi: 10.1016/j.seps.2024.102083.

[55] H. Hasim et al., “Employing Binary Logistic Regression in Modeling the Effectiveness of Agricultural Extension in Clove Farming: Facts and Findings from Sidrap Regency, Indonesia,” Sustainability, vol. 17, no. 6, p. 2786, Mar. 2025, doi: 10.3390/su17062786.

[56] E. Kosasih, N. K. W. Asmara Santhi, N. W. A. Febriyanti, E. V. Br Barus, and M. Susilawati, “Identification of Risk Factors for Chronic Kidney Disease Using Binary Logistic Regression,” International Journal of Applied Mathematics and Computing, vol. 2, no. 3, pp. 09–17, Jul. 2025, doi: 10.62951/ijamc.v2i3.222.

Downloads

Published

2025-05-09

How to Cite

Amri, I. F., Rohim, F. H. N., Ardiansyah, M. I., Saputra, F. S., Supriyanto, Ningrum, A. F., & Nakib, A. M. (2025). Analysis of Suspected Factors in Tuberculosis Cases in Semarang City Using a Logistic Regression Model. Scientific Journal of Computer Science, 1(1), 23–34. https://doi.org/10.64539/sjcs.v1i1.2025.32

Issue

Section

Articles

Most read articles by the same author(s)