DOI:
https://doi.org/10.64539/sjcs.v2i1.2026.378Keywords:
IDS, MiniBatchKmeans, Data Imbalance, Cyber-attack, Hyperopt, SMOTEAbstract
Intrusion Detection Systems (IDS) deal with issues concerning the ever-escalating level of sophistication observed within cyber threats. Nonetheless, IDS performance is deteriorated by class imbalance and excessively high-dimensional features, which cause biased classifier training towards major traffic patterns. Thus, this research introduces an innovative hybrid clustering IDS approach that utilizes MiniBatchKMeans clustering and ensemble machine learning strategies to mitigate these challenges. The suggested IDS approach utilizes the Synthetic Minority Over-sampling Technique for addressing class imbalance problems, Fast Correlation-Based Filter for reducing high-dimensional features, and Hyperopt Tree-structured Parzen Estimator for optimizing clustering and machine classifiers' parameters. Four supervised machine classifiers — Decision Tree classifier, Random Forest classifier, Extra Trees classifier, and XGBoost classifier— were trained and validated on the NSL-KDD IDS dataset. Additionally, experimental analysis indicated a superior detection accuracy for all classifiers, for which the best-optimized XGBoost classifier and best-optimized Random Forest classifier provided 99.57% and 99.51% accuracy, respectively. The proposed clustering-optimized machine IDS approach provided substantial improvements for identifying minority class attacks, along with sustainability and high generalization capabilities. The obtained outcomes support the research premise concerning the efficacy of cluster-aware sampling and ensemble optimizations for designing more balanced, accurate, and adaptive IDS systems for effectively protecting against ever-escalating real-life threats within the cyberworld.
References
[1] A. H. Farooqi, S. Akhtar, H. Rahman, T. Sadiq, and W. Abbass, “Enhancing network intru-sion detection using an ensemble voting classifier for internet of things,” Sensors, vol. 24, no. 1, p. 127, 2023. https://doi.org/10.3390/s24010127.
[2] Y. Yin, J. Jang-Jaccard, F. Sabrina, and J. Kwak, “Improving multilayer-perceptron (MLP)-based network anomaly detection with birch clustering on CICIDS-2017 dataset,” in 2023 26th international conference on computer supported cooperative work in design (CSCWD), 2023, pp. 423–431. https://doi.org/10.1109/CSCWD57460.2023.10152640.
[3] M. A. Hambali and O. C. Peter, “An Android Malware Detection System Based on Hybrid Artificial Neural Network and Decision Tree Algorithm,” SLU Journal Of Science And Technology, 2024. https://slujst.com.ng/wp-content/uploads/2024/09/SLUJST482_PP_45-68.pdf.
[4] G. Nassreddine, M. Nassereddine, and O. Al-Khatib, “Ensemble learning for network in-trusion detection based on correlation and embedded feature selection techniques,” Comput-ers, vol. 14, no. 3, p. 82, 2025. https://doi.org/10.3390/computers14030082.
[5] M. B. Musthafa et al., “Optimizing IoT intrusion detection using balanced class distribution, feature selection, and ensemble machine learning techniques,” Sensors, vol. 24, no. 13, p. 4293, 2024. https://doi.org/10.3390/s24134293.
[6] P. Bedi, N. Gupta, and V. Jindal, “I-SiamIDS: an improved Siam-IDS for handling class im-balance in network-based intrusion detection systems,” Appl. Intell., vol. 51, no. 2, pp. 1133–1151, 2021. https://doi.org/10.1007/s10489-020-01886-y.
[7] L. B. Asaju, P. B. Shola, N. Franklin, and H. M. Abiola, “Intrusion Detection System on a Computer Network Using an Ensemble of Randomizable Filtered Classifier, K-Nearest Neighbor Algorithm,” FUW Trends in Science & Technology Journal, vol. 2, no. 1, pp. 550–553, 2017. https://www.semanticscholar.org/paper/INTRUSION-DETECTION-SYSTEM-ON-A-COMPUTER-NETWORK-AN-Asaju-Bolaji/91fc2ea30b385363e0644afcff4893864bed8372
[8] M. A. Hambali, T. O. Oladele, and K. S. Adewole, “Microarray cancer feature selection: Re-view, challenges and research directions,” Int. J. Cogn. Comput. Eng., vol. 1, no. October, pp. 78–97, 2020, https://doi.org/10.1016/j.ijcce.2020.11.001.
[9] I. Ramos-Pérez, J. A. Barbero-Aparicio, A. Canepa-Oneto, Á. Arnaiz-González, and J. Maudes-Raedo, “An extensive performance comparison between feature reduction and fea-ture selection preprocessing algorithms on imbalanced wide data,” Information, vol. 15, no. 4, p. 223, 2024. https://doi.org/10.3390/info15040223.
[10] M. A. Hambali, T. O. Oladele, K. S. Adewole, A. K. Sangaiah, and W. Gao, “Feature selec-tion and computational optimization in high-dimensional microarray cancer datasets via InfoGain-modified bat algorithm,” Multimed. Tools Appl., vol. 1213, pp. 1–45, 2022, https://doi.org/10.1007/s11042-022-13532-5.
[11] O. Peter, M. Hambali, S. Tosin, A. Wreford, and C. Ifeoma, “Android Malware Detection System: a Review and Research Directions,” Int. Rev. Comput. Softw., vol. 19, no. 1, pp. 1–13, 2024. https://doi.org/10.15866/irecos.v19i1.23734.
[12] M. Altalhan, A. Algarni, and M. T.-H. Alouane, “Imbalanced data problem in machine learning: A review,” IEEE Access, vol. 13, pp. 13686 – 13699, 2025. https://doi.org/10.1109/ACCESS.2025.3531662.
[13] J. Zhu and X. Liu, “An integrated intrusion detection framework based on subspace cluster-ing and ensemble learning,” Comput. Electr. Eng., vol. 115, p. 109113, 2024. https://doi.org/10.1016/j.compeleceng.2024.109113.
[14] Z. Chen, L. Zhou, and W. Yu, “ADASYN− Random Forest based intrusion detection model,” in Proceedings of the 2021 4th International Conference on Signal Processing and Machine Learning, 2021, pp. 152–159. https://doi.org/10.1145/3483207.3483232.
[15] H. Le, T.-T.-H., Shin, Y., Kim, M., & Kim, “Towards unbalanced multiclass intrusion detec-tion with hybrid sampling methods and ensemble classification,” Appl. Soft Comput., vol. 15, p. 111517, 2024, https://doi.org/10.1016/j.asoc.2024.111517.
[16] O. Nassreddine, G., Nassereddine, M., & Al-Khatib, “Ensemble learning for network intru-sion detection based on correlation and embedded feature selection techniques,” Computers, vol. 14, no. 3, p. 82, 2025, https://doi.org/10.3390/computers14030082.
[17] Y. Lv, H., & Ding, “A hybrid intrusion detection system with K-means and CNN+LSTM,” EAI Endorsed Trans. Scalable Inf. Syst., vol. 11, no. 6, p. Article 6, 2024, https://doi.org/10.4108/eetsis.5667.
[18] J. Henriques, F. Caldeira, T. Cruz, and P. Simões, “Combining k-means and xgboost models for anomaly detection using log datasets,” Electronics, vol. 9, no. 7, p. 1164, 2020. https://doi.org/10.3390/electronics9071164.
[19] P. Chapagain, A. Timalsina, M. Bhandari, and R. Chitrakar, “Intrusion detection based on PCA with improved K-means,” in Innovations in Electrical and Electronic Engineering, 2022, pp. 13–27. https://doi.org/10.1007/978-981-19-1677-9_2.
[20] M. Kherbache, D. Espes, and K. Amroun, “An Enhanced approach of the K-means clustering for Anomaly-based intrusion detection systems,” in 2021 International Conference on Compu-ting, Computational Modelling and Applications (ICCMA), 2021, pp. 78–83. https://doi.org/10.1109/ICCMA53594.2021.00021.
[21] N. Hu, Z. Tian, H. Lu, X. Du, and M. Guizani, “A multiple-kernel clustering based intrusion detection scheme for 5G and IoT networks,” Int. J. Mach. Learn. Cybern., vol. 12, no. 11, pp. 3129–3144, 2021. https://doi.org/10.1007/s13042-020-01253-w.
[22] K. Samunnisa, G. S. V. Kumar, and K. Madhavi, “Intrusion detection system in distributed cloud computing: Hybrid clustering and classification methods,” Meas. Sensors, vol. 25, p. 100612, 2023. https://doi.org/10.1016/j.measen.2022.100612.
[23] M. Aamir and S. M. A. Zaidi, “Clustering based semi-supervised machine learning for DDoS attack classification,” J. King Saud Univ. Inf. Sci., vol. 33, no. 4, pp. 436–446, 2021. https://doi.org/10.1016/j.jksuci.2019.02.003.
[24] Z. K. Maseer, R. Yusof, N. Bahaman, S. A. Mostafa, and C. F. M. Foozy, “Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 da-taset,” IEEE access, vol. 9, pp. 22351–22370, 2021. https://doi.org/10.1109/ACCESS.2021.3056614.
[25] R. Younisse and Q. A. Al-Haija, “An empirical study on utilizing online k-means clustering for intrusion detection purposes,” in 2023 International Conference on Smart Applications, Communications and Networking (SmartNets), 2023, pp. 1–5. https://doi.org/10.1109/SmartNets58706.2023.10215737.
[26] A. A. Chormale, P. Ukhalkar, and U. A. Deshmukh, “Clustering-Based Intrusion Detection System for High Volume and High Velocity Packet Streams,” in 2024 8th International Con-ference on Computing, Communication, Control and Automation (ICCUBEA), 2024, pp. 1–5. https://doi.org/10.1109/ICCUBEA61740.2024.10775170.
[27] H. Wang, F. Kandah, T. Mendis, and L. Medury, “Clustering-based intrusion detection sys-tem meets multi-critics generative adversarial networks,” IEEE Internet Things J., 2025. https://doi.org/10.1109/JIOT.2025.3533918.
[28] A. Thakkar and R. Lohiya, “Attack classification of imbalanced intrusion data for IoT net-work using ensemble-learning-based deep neural network,” IEEE Internet Things J., vol. 10, no. 13, pp. 11888–11895, 2023. https://doi.org/10.1109/JIOT.2023.3244810.
[29] H. Ren, Y. Tang, W. Dong, S. Ren, and L. Jiang, “DUEN: Dynamic ensemble handling class imbalance in network intrusion detection,” Expert Syst. Appl., vol. 229, p. 120420, 2023. https://doi.org/10.1016/j.eswa.2023.120420.
[30] M. A. Hossain and M. S. Islam, “Ensuring network security with a robust intrusion detec-tion system using ensemble-based machine learning,” Array, vol. 19, p. 100306, 2023. https://doi.org/10.1016/j.array.2023.100306.
[31] N. Thockchom, M. M. Singh, and U. Nandi, “A novel ensemble learning-based model for network intrusion detection,” Complex Intell. Syst., vol. 9, no. 5, pp. 5693–5714, 2023. https://doi.org/10.1007/s40747-023-01013-7.
[32] F. Jemili, R. Meddeb, and O. Korbaa, “Intrusion detection based on ensemble learning for big data classification,” Cluster Comput., vol. 27, no. 3, pp. 3771–3798, 2024. https://doi.org/10.1007/s10586-023-04168-7.
[33] A. Sarkar, H. S. Sharma, and M. M. Singh, “A supervised machine learning-based solution for efficient network intrusion detection using ensemble learning based on hyperparameter optimization,” Int. J. Inf. Technol., vol. 15, no. 1, pp. 423–434, 2023. https://doi.org/10.1007/s41870-022-01115-4.
[34] M. A. Akhtar, S. M. O. Qadri, M. A. Siddiqui, S. M. N. Mustafa, S. Javaid, and S. A. Ali, “Robust genetic machine learning ensemble model for intrusion detection in network traf-fic,” Sci. Rep., vol. 13, no. 1, p. 17227, 2023. https://doi.org/10.1038/s41598-023-43816-1.
[35] A. A. Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, “NSL-KDD dataset,” 2009.

