A Study of Loss Weight Balance in Lightweight Self-Distilled Crowd Counting

Authors

  • Muhammad Raza Nanjing University of Information Science and Technology, China
  • Atta Ur Rahman Nanjing University of Information Science and Technology, China
  • Pandula Pallewatta Nanjing University of Information Science and Technology, China
  • Inayat Ur Rahman Nanjing University of Information Science and Technology, China
  • Sahib Bahadar Nanjing University of Information Science and Technology, China

DOI:

https://doi.org/10.64539/sjer.v2i3.2026.493

Keywords:

Crowd counting, Lightweight crowd counting, Self-knowledge distillation, Composite loss weighting, Density map regression

Abstract

Lightweight crowd counting is important for real-time surveillance and resource-constrained deployment, where both computational efficiency and effective supervision are required. Although teacher-free self-distillation can improve lightweight density-regression models by guiding intermediate representations without an external teacher, the influence of composite loss weights in such frameworks has not been sufficiently analyzed. This paper presents a focused coefficient-wise loss-weight analysis within the Lightweight Self-Knowledge Distillation framework for single-image crowd counting. Instead of proposing a new architecture, the study investigates how the coefficients α, β, γ, and λ₂ affect optimization behavior and counting accuracy under a fixed experimental setup on ShanghaiTech Part B. Specifically, α controls intermediate feature alignment, β controls consistency supervision, γ controls direct density-regression supervision, and λ₂ controls the structural similarity term in the regression loss. The results show that moderate values of α and β improve performance by providing useful internal regularization, while excessive auxiliary weighting can slightly degrade accuracy. The analysis also indicates that γ should remain dominant because direct density-map regression is the primary learning signal. The best observed configuration is α = 6.0, β = 2.0, γ = 13.0, and λ₂ = 0.2, achieving 8.94 MAE and 11.51 RMSE on ShanghaiTech Part B. These findings highlight the importance of balanced supervision design within the evaluated LSKD framework on ShanghaiTech Part B.

References

[1] V. A. Sindagi and V. M. Patel, “A survey of recent advances in CNN-based single image crowd counting and density estimation,” Pattern Recognition Letters, vol. 107, pp. 3–16, 2018. https://doi.org/10.1016/j.patrec.2017.07.007.

[2] H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2013, pp. 2547–2554. https://doi.org/10.1109/CVPR.2013.329.

[3] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 589–597. https://doi.org/10.1109/CVPR.2016.70.

[4] Y. Li, X. Zhang, and D. Chen, “CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 1091–1100. https://doi.org/10.1109/CVPR.2018.00120.

[5] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,”arXiv preprint arXiv:1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861.

[6] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4510–4520. https://doi.org/10.1109/CVPR.2018.00474.

[7] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More features from cheap operations,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 1580–1589. https://doi.org/10.1109/CVPR42600.2020.00165.

[8] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015. https://doi.org/10.48550/arXiv.1503.02531.

[9] Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4320–4328. https://doi.org/10.1109/CVPR.2018.00454.

[10] L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be your own teacher: Improve the performance of convolutional neural networks via self distillation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 3713–3722. https://doi.org/10.1109/ICCV.2019.00381.

[11] M. Raza, M. Ling, A. U. Rahman, P. Pallewatta, A. A. Hersi, S. M. Beruwalage, and D. S. Kannangara, “LSKD: Lightweight self-knowledge distillation framework for fast and robust crowd counting,” Scientific Journal of Engineering Research, vol. 2, no. 2, 2026. https://doi.org/10.64539/sjer.v2i2.2026.436.

[12] A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2008, pp. 1–7. https://doi.org/10.1109/CVPR.2008.4587569.

[13] V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in Advances in Neural Information Processing Systems (NeurIPS), 2010, pp. 1324–1332. https://proceedings.neurips.cc/paper_files/paper/2010/file/fe73f687e5bc5280214e0486b273a5f9-Paper.pdf.

[14] D. B. Sam, S. Surya, and R. V. Babu, “Switching convolutional neural network for crowd counting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 5744–5752. https://doi.org/10.1109/CVPR.2017.429.

[15] X. Cao, Z. Wang, Y. Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 757–773. https://openaccess.thecvf.com/content_ECCV_2018/papers/Xinkun_Cao_Scale_Aggregation_Network_ECCV_2018_paper.pdf.

[16] X. Jiang, Z. Xiao, B. Zhang, X. Zhen, X. Cao, D. Doermann, and L. Shao, “Crowd counting and density estimation by trellis encoder-decoder networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 6133–6142. https://doi.org/10.1109/CVPR.2019.00629.

[17] Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 6142–6151. https://doi.org/10.1109/ICCV.2019.00624.

[18] W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 5099–5108. https://doi.org/10.1109/CVPR.2019.00524.

[19] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 116–131. https://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf.

[20] C. Gao, P. Wang, and Y. Gao, “MobileCount: An efficient encoder-decoder framework for real-time crowd counting,” in Pattern Recognition and Computer Vision: Second Chinese Conference, 2019, pp. 582–595. https://doi.org/10.1007/978-3-030-31723-2_50.

[21] Y.-B. Liu, G. Cao, H. Shi, and Y. Hu, “Lw-Count: An effective lightweight encoding-decoding crowd counting network,” IEEE Trans. Circuits Syst. Video Techn., vol. 32, no. 10, pp. 6821–6834, 2022. https://doi.org/10.1109/TCSVT.2022.3171235.

[22] T. Furlanello, Z. C. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born-again neural networks,” in Proc. 35th Int. Conf. Mach. Learn. (ICML), 2018, pp. 1602–1611. https://proceedings.mlr.press/v80/furlanello18a/furlanello18a.pdf.

[23] A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 1195–1204. https://proceedings.neurips.cc/paper_files/paper/2017/file/68053af2923e00204c3ca7c6a3150cf7-Paper.pdf.

[24] V. A. Sindagi and V. M. Patel, “HA-CCN: Hierarchical attention-based crowd counting network,” IEEE Trans. Image Process., vol. 29, pp. 323–335, 2020. https://doi.org/10.1109/TIP.2019.2928634.

[25] B. Wang, H. Liu, D. Samaras, and M. Hoai, “Distribution matching for crowd counting,” in Advances in Neural Information Processing Systems (NeurIPS), 2020, vol. 33, pp. 1595–1607. https://proceedings.neurips.cc/paper_files/paper/2020/file/118bd558033a1016fcc82560c65cca5f-Paper.pdf.

Downloads

Published

2026-05-29

How to Cite

Raza, M., Ur Rahman, A., Pandula Pallewatta, Ur Rahman, I., & Bahadar, S. (2026). A Study of Loss Weight Balance in Lightweight Self-Distilled Crowd Counting. Scientific Journal of Engineering Research, 2(3), 409–420. https://doi.org/10.64539/sjer.v2i3.2026.493

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)