LSKD: Lightweight Self-Knowledge Distillation Framework for Fast and Robust Crowd Counting

Muhammad Raza; Miaogen Ling; Atta Ur Rahman; Pandula Pallewatta; Aboubakar Abdinur Hersi; Shehan Maxwell Beruwalage; Deshan Sachintha Kannangara

doi:10.64539/sjer.v2i2.2026.436

Authors

Muhammad Raza Nanjing University of Information Science and Technology, China
Miaogen Ling Nanjing University of Information Science and Technology, China
Atta Ur Rahman Nanjing University of Information Science and Technology, China
Pandula Pallewatta Nanjing University of Information Science and Technology, China
Aboubakar Abdinur Hersi Nanjing University of Information Science and Technology, China
Shehan Maxwell Beruwalage Nanjing University of Information Science and Technology, China
Deshan Sachintha Kannangara Nanjing University of Information Science and Technology, China

DOI:

https://doi.org/10.64539/sjer.v2i2.2026.436

Keywords:

Crowd counting, Teacher–student distillation, Self-knowledge distillation, Density map regression, Lightweight network, Multi-level feature alignment

Abstract

Crowd counting plays an important role in the surveillance of the safety of the people, traffic, and intelligent surveillance systems. However, the exact density estimations remain hard to achieve in highly congested scenes due to the tough occlusion, large-scale variance, and complicated background. Although the recent deep-learning methods have high performance, several of them do not need computationally efficient underlying backbone networks, and rather, they employ an external teacher-student distillation architecture, which can limit their use in resource-constrained applications. To avoid this problem, we introduce LSKD, a lightweight self-knowledge distillation network that is density map regression-specific. Unlike other conventional teacher-dependent processes, LSKD can also independently carry out internal multi-level feature alignment within a single small network that is not in need of an external teacher model. The structure integrates a Feature Matching Block (FMB) and a Context Fusion (CoFuse) block to enhance the hierarchical match of features and global awareness of context. The large experiments demonstrate that LSKD obtain competitive performance using the number of parameters as 2.65 million and GFLOPs as 10.23. Particularly, it has 63.17 MAE on ShanghaiTech Part A, 8.94 on ShanghaiTech Part B, 143.7 on UCF-QNRF, and 223.88 on UCF-CC-50, which is a good ratio between the accuracy and the efficiency of the calculations. Such results indicate that LSKD has an implementable and efficient solution to the real-time counting of crowds at the edge devices.

References

[1] V. A. Sindagi and V. M. Patel, “A survey of recent advances in CNN-based single image crowd counting and density estimation,” Pattern Recognit. Lett., vol. 107, pp. 3–16, May 2018, https://doi.org/10.1016/J.PATREC.2017.07.007.

[2] H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source Multi-scale Counting in Extremely Dense Crowd Images,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013. https://doi.org/10.1109/CVPR.2013.329.

[3] A. B. Chan, Z. S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2008, https://doi.org/10.1109/CVPR.2008.4587569.

[4] A. B. Chan and N. Vasconcelos, “Counting people with low-level features and bayesian regression,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 2160–2177, Apr. 2012, https://doi.org/10.1109/TIP.2011.2172800.

[5] V Lempitsky, A Zisserman, “Learning to count objects in images,” Advances in Neural Information Processing Systems 23 (NIPS 2010), 2010. https://proceedings.neurips.cc/paper/2010/hash/fe73f687e5bc5280214e0486b273a5f9-Abstract.html.

[6] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-Image Crowd Counting via Multi-Column Convolutional Neural Network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. https://doi.org/10.1109/CVPR.2016.70.

[7] D. B. Sam, S. Surya, and R. V. Babu, “Switching convolutional neural network for crowd counting,” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 4031–4039, Nov. 2017, https://doi.org/10.1109/CVPR.2017.429.

[8] J. Liu, C. Gao, D. Meng, and A. G. Hauptmann, “DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. https://doi.org/10.1109/CVPR.2018.00545.

[9] H. Idrees et al., “Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 532-546, 2018. https://openaccess.thecvf.com/content_ECCV_2018/papers/Haroon_Idrees_Composition_Loss_for_ECCV_2018_paper.pdf.

[10] Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian Loss for Crowd Count Estimation With Point Supervision,” 2019. Accessed: Jan. 18, 2026. [Online] Available: https://github.com/ZhihengCV/

[11] B. Wang, H. Liu, D. Samaras, and M. H. Nguyen, “Distribution Matching for Crowd Counting,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 1595–1607, 2020. https://proceedings.neurips.cc/paper/2020/hash/118bd558033a1016fcc82560c65cca5f-Abstract.html.

[12] VA. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint arXiv:1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861.

[13] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 116-131. https://openaccess.thecvf.com/content_ECCV_2018/html/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.html.

[14] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More Features From Cheap Operations,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. https://doi.org/10.1109/CVPR42600.2020.00165.

[15] G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv preprint arXiv:1503.02531, 2015, https://doi.org/10.48550/arXiv.1503.02531.

[16] S. Zagoruyko and N. Komodakis, “Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer,” arXiv preprint arXiv:1612.03928, 2016, https://doi.org/10.48550/arXiv.1612.03928.

[17] L. Liu, J. Chen, H. Wu, T. Chen, G. Li, and L. Lin, “Efficient Crowd Counting via Structured Knowledge Transfer,” MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia, pp. 2645–2654, Oct. 2020, https://doi.org/10.1145/3394171.3413938.

[18] T. Furlanello, Z. C. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born Again Neural Networks,” Jul. 03, 2018, PMLR. Accessed: Jan. 18, 2026. [Online]. Available: https://proceedings.mlr.press/v80/furlanello18a.html.

[19] Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep Mutual Learning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. https://doi.org/10.1109/CVPR.2018.00454.

[20] Y. Lee, J. Willette, J. Kim, J. Lee, and S. J. Hwang, “Exploring the Role of Mean Teachers in Self-supervised Masked Auto-Encoders,” arXiv preprint arXiv:2210.02077, 2022. https://doi.org/10.48550/arXiv.2210.02077.

[21] I. S. Topkaya, H. Erdogan, and F. Porikli, “Counting people by clustering person detector outputs,” 11th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2014, pp. 313–318, Oct. 2014, https://doi.org/10.1109/AVSS.2014.6918687.

[22] Y. Li, X. Zhang, and D. Chen, “CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. https://doi.org/10.1109/CVPR.2018.00120.

[23] X. Chen, Y. Bin, N. Sang, and C. Gao, “Scale pyramid network for crowd counting,” Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019, pp. 1941–1950, Mar. 2019, https://doi.org/10.1109/WACV.2019.00211.

[24] X. Jiang et al., “Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. https://doi.org/10.1109/CVPR.2019.00629.

[25] L. Zhu et al., “Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting,” arXiv preprint arXiv:1902.01115, 2019. https://doi.org/10.48550/arXiv.1902.01115.

[26] X. Shi, X. Li, C. Wu, S. Kong, J. Yang, and L. He, “A Real-Time Deep Network for Crowd Counting,” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2020-May, pp. 2328–2332, May 2020, https://doi.org/10.1109/ICASSP40776.2020.9053780.

[27] J. Chen, S. Xiu, X. Chen, H. Guo, and X. Xie, “Flounder-Net: An efficient CNN for crowd counting by aerial photography,” Neurocomputing, vol. 420, pp. 82–89, Jan. 2021, https://doi.org/10.1016/J.NEUCOM.2020.09.001.

[28] G. Gao, Q. Liu, Z. Hu, L. Li, Q. Wen, and Y. Wang, “PSGCNet: A Pyramidal Scale and Global Context Guided Network for Dense Object Counting in Remote-Sensing Images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, 2022, https://doi.org/10.1109/TGRS.2022.3153946.

[29] X. Guo, K. Song, M. Gao, W. Zhai, Q. Li, and G. Jeon, “Crowd counting in smart city via lightweight Ghost Attention Pyramid Network,” Future Generation Computer Systems, vol. 147, pp. 328–338, Oct. 2023, https://doi.org/10.1016/J.FUTURE.2023.05.013.

[30] Y. C. Li, R. S. Jia, Y. X. Hu, and H. M. Sun, “A lightweight dense crowd density estimation network for efficient compression models,” Expert Syst. Appl., vol. 238, p. 122069, Mar. 2024, https://doi.org/10.1016/J.ESWA.2023.122069.

[31] F. Zhu, H. Yan, X. Chen, and T. Li, “Real-time crowd counting via lightweight scale-aware network,” Neurocomputing, vol. 472, pp. 54–67, Feb. 2022, https://doi.org/10.1016/J.NEUCOM.2021.11.099.

[32] G. Jiang, R. Wu, Z. Huo, C. Zhao, and J. Luo, “LigMSANet: Lightweight multi-scale adaptive convolutional neural network for dense crowd counting,” Expert Syst. Appl., vol. 197, p. 116662, Jul. 2022, https://doi.org/10.1016/J.ESWA.2022.116662.

[33] S. Wang, Z. Pu, Q. Li, and Y. Wang, “Estimating crowd density with edge intelligence based on lightweight convolutional neural networks,” Expert Syst. Appl., vol. 206, p. 117823, Nov. 2022, https://doi.org/10.1016/J.ESWA.2022.117823.

[34] Y. Chaudhuri, A. Kumar, O. C. Phukan, and A. B. Buduru, “A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting,” arXiv preprint arXiv:2401.05968, 2024. https://doi.org/10.48550/arXiv.2401.05968.

[35] Y. Chen, H. Zhao, M. Gao, and M. Deng, “A Weakly Supervised Hybrid Lightweight Network for Efficient Crowd Counting,” Electronics 2024, Vol. 13, Page 723, vol. 13, no. 4, p. 723, Feb. 2024, https://doi.org/10.3390/electronics13040723.

[36] Y. Li, F. Yu, and Q. Chen, “Lightweight Dynamic Convolutional Network for Crowd Counting Based on Curriculum Reinforcement Learning,” IEEE Transactions on Artificial Intelligence, 2025, https://doi.org/10.1109/TAI.2025.3566923.

[37] Y. Liu, G. Cao, H. Shi, and Y. Hu, “Lw-Count: An Effective Lightweight Encoding-Decoding Crowd Counting Network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 10, pp. 6821–6834, Oct. 2022, https://doi.org/10.1109/TCSVT.2022.3171235.

[38] L. Liang, H. Zhao, F. Zhou, M. Ma, F. Yao, and X. Ji, “PDDNet: lightweight congested crowd counting via pyramid depth-wise dilated convolution,” Applied Intelligence 2022 53:9, vol. 53, no. 9, pp. 10472–10484, Aug. 2022, https://doi.org/10.1007/S10489-022-03967-6.

[39] H. Lee and J. Lee, “TinyCount: an efficient crowd counting network for intelligent surveillance,” Journal of Real-Time Image Processing, vol. 21, no. 4, pp. 153-, Aug. 2024, https://doi.org/10.1007/S11554-024-01531-8.

[40] Y. Liu, Q. Yi, and J. Zeng, “Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting,” arXiv preprint arXiv:2206.05475, 2022. https://doi.org/10.48550/arXiv.2206.05475.

[41] M. Jiang, J. Lin, and Z. Jane Wang, “ShuffleCount: Task-Specific Knowledge Distillation for Crowd Counting,” Proceedings - International Conference on Image Processing, ICIP, pp. 999–1003, 2021, https://doi.org/10.1109/ICIP42928.2021.9506698.

[42] R. Wang et al., “Efficient Crowd Counting via Dual Knowledge Distillation,” IEEE Transactions on Image Processing, vol. 33, pp. 569–583, 2024, https://doi.org/10.1109/TIP.2023.3343609.

[43] M. A. Khan, H. Menouar, R. Hamila, and A. Abu-Dayya, “Crowd counting at the edge using weighted knowledge distillation,” Scientific Reports 2025 15:1, vol. 15, no. 1, pp. 11932-, Apr. 2025, https://doi.org/10.1038/s41598-025-90750-5.

[44] VW. Zhou, X. Yang, W. Yan, and Q. Jiang, “Hybrid Knowledge Distillation for RGB-T Crowd Density Estimation in Smart Surveillance Systems,” IEEE Internet Things J., vol. 12, no. 7, pp. 9276–9289, 2025, https://doi.org/10.1109/JIOT.2024.3506624.

[45] W. Zhou, X. Yang, X. Dong, M. Fang, W. Yan, and T. Luo, “MJPNet-S∗: Multistyle Joint-Perception Network with Knowledge Distillation for Drone RGB-Thermal Crowd Density Estimation in Smart Cities,” IEEE Internet Things J., vol. 11, no. 11, pp. 20327–20339, Jun. 2024, https://doi.org/10.1109/JIOT.2024.3369642.

[46] Y. Gu, “Perspective-aware distillation-based crowd counting,” ACM International Conference Proceeding Series, pp. 123–128, Jul. 2020, https://doi.org/10.1145/3417188.3417195.

[47] S. Yun, J. Park, K. Lee, and J. Shin, “Regularizing Class-Wise Predictions via Self-Knowledge Distillation,” 2020. Accessed: Nov. 28, 2025. [Online]. Available: https://github.com/alinlab/cs-kd.

[48] Z. Allen-Zhu and Y. Li, “Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning,” arXiv preprint arXiv:2012.09816, 2020. https://doi.org/10.48550/arXiv.2012.09816.

[49] Y. Hou, Z. Ma, C. Liu, and C. C. Loy, “Learning Lightweight Lane Detection CNNs by Self Attention Distillation,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1013–1021, Aug. 2019, https://doi.org/10.1109/ICCV.2019.00110.

[50] VM. Ji, S. Shin, S. Hwang, G. Park, and I.-C. Moon, “Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. https://doi.org/10.1109/CVPR46437.2021.01052.

[51] K. Kim, B. Ji, D. Yoon, and S. Hwang, “Self-Knowledge Distillation With Progressive Refinement of Targets,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021. https://doi.org/10.1109/ICCV48922.2021.00650.

[52] Z. Yang, A. Zeng, Z. Li, T. Zhang, C. Yuan, and Y. Li, “From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2023. https://doi.org/10.1109/ICCV51070.2023.01576.

[53] J. Yi, F. Chen, Z. Shen, Y. Xiang, S. Xiao, and W. Zhou, “An Effective Lightweight Crowd Counting Method Based on an Encoder-Decoder Network for Internet of Video Things,” IEEE Internet Things J., vol. 11, no. 2, pp. 3082–3094, Jan. 2024, https://doi.org/10.1109/JIOT.2023.3294727.

[54] VM. Xi and H. Yan, “Lightweight multi-scale network with attention for accurate and efficient crowd counting,” The Visual Computer, vol. 40, no. 6, pp. 4553–4566, Sep. 2023, https://doi.org/10.1007/S00371-023-03099-Z.

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Most read articles by the same author(s)

Cover of the Journal

Quick Access

Editorial Policies

Journal metrics

Information

Template

Flag Counter

Indexing and Abstract

Tools