A Study of Loss Weight Balance in Lightweight Self-Distilled Crowd Counting

Muhammad Raza; Atta Ur Rahman; Pandula Pallewatta; Inayat Ur Rahman; Sahib Bahadar

doi:10.64539/sjer.v2i3.2026.493

Authors

Muhammad Raza Nanjing University of Information Science and Technology, China
Atta Ur Rahman Nanjing University of Information Science and Technology, China
Pandula Pallewatta Nanjing University of Information Science and Technology, China
Inayat Ur Rahman Nanjing University of Information Science and Technology, China
Sahib Bahadar Nanjing University of Information Science and Technology, China

DOI:

https://doi.org/10.64539/sjer.v2i3.2026.493

Keywords:

Crowd counting, Lightweight crowd counting, Self-knowledge distillation, Composite loss weighting, Density map regression

Abstract

Lightweight crowd counting is important for real-time surveillance and resource-constrained deployment, where both computational efficiency and effective supervision are required. Although teacher-free self-distillation can improve lightweight density-regression models by guiding intermediate representations without an external teacher, the influence of composite loss weights in such frameworks has not been sufficiently analyzed. This paper presents a focused coefficient-wise loss-weight analysis within the Lightweight Self-Knowledge Distillation framework for single-image crowd counting. Instead of proposing a new architecture, the study investigates how the coefficients α, β, γ, and λ₂ affect optimization behavior and counting accuracy under a fixed experimental setup on ShanghaiTech Part B. Specifically, α controls intermediate feature alignment, β controls consistency supervision, γ controls direct density-regression supervision, and λ₂ controls the structural similarity term in the regression loss. The results show that moderate values of α and β improve performance by providing useful internal regularization, while excessive auxiliary weighting can slightly degrade accuracy. The analysis also indicates that γ should remain dominant because direct density-map regression is the primary learning signal. The best observed configuration is α = 6.0, β = 2.0, γ = 13.0, and λ₂ = 0.2, achieving 8.94 MAE and 11.51 RMSE on ShanghaiTech Part B. These findings highlight the importance of balanced supervision design within the evaluated LSKD framework on ShanghaiTech Part B.

References

[1] V. A. Sindagi and V. M. Patel, “A survey of recent advances in CNN-based single image crowd counting and density estimation,” Pattern Recognition Letters, vol. 107, pp. 3–16, 2018. https://doi.org/10.1016/j.patrec.2017.07.007.

[2] H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2013, pp. 2547–2554. https://doi.org/10.1109/CVPR.2013.329.

[3] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 589–597. https://doi.org/10.1109/CVPR.2016.70.

[4] Y. Li, X. Zhang, and D. Chen, “CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 1091–1100. https://doi.org/10.1109/CVPR.2018.00120.

[5] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,”arXiv preprint arXiv:1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861.

[6] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4510–4520. https://doi.org/10.1109/CVPR.2018.00474.

[7] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More features from cheap operations,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 1580–1589. https://doi.org/10.1109/CVPR42600.2020.00165.

[8] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015. https://doi.org/10.48550/arXiv.1503.02531.

[9] Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4320–4328. https://doi.org/10.1109/CVPR.2018.00454.

[10] L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be your own teacher: Improve the performance of convolutional neural networks via self distillation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 3713–3722. https://doi.org/10.1109/ICCV.2019.00381.

[11] M. Raza, M. Ling, A. U. Rahman, P. Pallewatta, A. A. Hersi, S. M. Beruwalage, and D. S. Kannangara, “LSKD: Lightweight self-knowledge distillation framework for fast and robust crowd counting,” Scientific Journal of Engineering Research, vol. 2, no. 2, 2026. https://doi.org/10.64539/sjer.v2i2.2026.436.

[12] A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2008, pp. 1–7. https://doi.org/10.1109/CVPR.2008.4587569.

[13] V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in Advances in Neural Information Processing Systems (NeurIPS), 2010, pp. 1324–1332. https://proceedings.neurips.cc/paper_files/paper/2010/file/fe73f687e5bc5280214e0486b273a5f9-Paper.pdf.

[14] D. B. Sam, S. Surya, and R. V. Babu, “Switching convolutional neural network for crowd counting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 5744–5752. https://doi.org/10.1109/CVPR.2017.429.

[15] X. Cao, Z. Wang, Y. Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 757–773. https://openaccess.thecvf.com/content_ECCV_2018/papers/Xinkun_Cao_Scale_Aggregation_Network_ECCV_2018_paper.pdf.

[16] X. Jiang, Z. Xiao, B. Zhang, X. Zhen, X. Cao, D. Doermann, and L. Shao, “Crowd counting and density estimation by trellis encoder-decoder networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 6133–6142. https://doi.org/10.1109/CVPR.2019.00629.

[17] Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 6142–6151. https://doi.org/10.1109/ICCV.2019.00624.

[18] W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 5099–5108. https://doi.org/10.1109/CVPR.2019.00524.

[19] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 116–131. https://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf.

[20] C. Gao, P. Wang, and Y. Gao, “MobileCount: An efficient encoder-decoder framework for real-time crowd counting,” in Pattern Recognition and Computer Vision: Second Chinese Conference, 2019, pp. 582–595. https://doi.org/10.1007/978-3-030-31723-2_50.

[21] Y.-B. Liu, G. Cao, H. Shi, and Y. Hu, “Lw-Count: An effective lightweight encoding-decoding crowd counting network,” IEEE Trans. Circuits Syst. Video Techn., vol. 32, no. 10, pp. 6821–6834, 2022. https://doi.org/10.1109/TCSVT.2022.3171235.

[22] T. Furlanello, Z. C. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born-again neural networks,” in Proc. 35th Int. Conf. Mach. Learn. (ICML), 2018, pp. 1602–1611. https://proceedings.mlr.press/v80/furlanello18a/furlanello18a.pdf.

[23] A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 1195–1204. https://proceedings.neurips.cc/paper_files/paper/2017/file/68053af2923e00204c3ca7c6a3150cf7-Paper.pdf.

[24] V. A. Sindagi and V. M. Patel, “HA-CCN: Hierarchical attention-based crowd counting network,” IEEE Trans. Image Process., vol. 29, pp. 323–335, 2020. https://doi.org/10.1109/TIP.2019.2928634.

[25] B. Wang, H. Liu, D. Samaras, and M. Hoai, “Distribution matching for crowd counting,” in Advances in Neural Information Processing Systems (NeurIPS), 2020, vol. 33, pp. 1595–1607. https://proceedings.neurips.cc/paper_files/paper/2020/file/118bd558033a1016fcc82560c65cca5f-Paper.pdf.

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Most read articles by the same author(s)

Cover of the Journal

Quick Access

Editorial Policies

Journal metrics

Information

Template

Flag Counter

Indexing and Abstract

Tools