NRCC-LC: Noise-Robust Crowd Counting with Dynamic Label Correction under Noisy Supervision

Abubakar Abdinur Hersi; Miaogen Ling; Muhammad Raza; Abdirahman Mohamed Hassan; Idris Aweis Hussien

doi:10.64539/sjer.v2i3.2026.494

Authors

Abubakar Abdinur Hersi Nanjing University of Information Science and Technology, China
Miaogen Ling Nanjing University of Information Science and Technology, China
Muhammad Raza Nanjing University of Information Science and Technology, China
Abdirahman Mohamed Hassan Nanjing University of Information Science and Technology, China
Idris Aweis Hussien Nanjing University of Information Science and Technology, China

DOI:

https://doi.org/10.64539/sjer.v2i3.2026.494

Keywords:

Crowd counting, Density estimation, Learning with noise, Transformer/CNN, Correcting labels, Teacher-student learning

Abstract

Crowd counting remains a challenge within computer vision due to many factors that affect the performance of available methods such as occlusion, scale variability, and perspective distortion. Additionally, many labels associated with crowd counting systems have high levels of noise caused by various real-world conditions. Although crowd counting methodologies have improved accuracy over recent years, the majority of crowd counting models still rely on clean real-time supervision and lack systems that can correct for dynamically corrupted labels, resulting in low robustness for crowd counting models when deployed in real-world applications. In this work we present a Noise-Robust Crowd Counting with Label Correction (NRCC-LC) framework to obtain reliable density estimates from noisy supervision. To accomplish this, our approach uses a combined CNN-Transformer architecture to capture both locally- and globally-relevant visual information (i.e., image content and context), along with a Noise-Robust Module (NRM) and a Dynamic Label Correction (DLC) mechanism. Our principle experimental results evaluated across four benchmark datasets: ShanghaiTech Part A, ShanghaiTech Part B, NWPU-Crowd, and JHU-Crowd++, indicate that the NRCC-LC exhibits competitive performance with respect to existing state-of-the-art crowd-counting methods; most notably, producing per-image MAEs of 97.8 and 392.3 on NWPU-Crowd. These experimental results additionally have real-world implications for improving public safety and urban planning; thus, through our novel method of noise-aware feature learning combined with iterative label correction, we can establish the potential of automated monitoring systems in complex, real-world environments to be significantly more reliable.

References

[1] F. Xiong, X. Lu, J. Xiao, Z. Cao, H. T. Shen, and C. W. Lin, “From open set to closed set: Counting objects by spatial divide-and-conquer,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2019. https://doi.org/10.1109/ICCV.2019.00845.

[2] L. Boominathan, S. S. S. Kruthiventi, and R. V. Babu, “CrowdNet: A deep convolutional network for dense crowd counting,” in Proc. ACM Int. Conf. Multimedia, 2016. https://doi.org/10.1145/2964284.2967300.

[3] V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 23, 2010. https://proceedings.neurips.cc/paper_files/paper/2010/hash/fe73f687e5bc5280214e0486b273a5f9-Abstract.html.

[4] J. Wan, Q. Wang, and A. B. Chan, “Kernel-based density map generation for dense object counting,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1357–1370, 2022. https://doi.org/10.1109/TPAMI.2020.3022878.

[5] X. Jia, N. Li, N. Ling, C. Wang, J. Chen, Q. Wang, “STCC: Scale-aware transformer for crowd counting,” Knowledge-Based Systems, vol. 334, 2026. https://doi.org/10.1016/j.knosys.2025.114992.

[6] C. Peng, Q. Sang, X. Wu, Z. Deng, L. Liu, “MTDNet: A crowd counting network based on a multiscale transformer and dilated convolution,” Signal Processing: Image Communication, vol. 140, 2026. https://doi.org/10.1016/j.image.2025.117423.

[7] Q. Wang, J. Gao, W. Lin, and X. Li, “NWPU-Crowd: A large-scale benchmark for crowd counting and localization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 43, no. 6, pp. 2141–2149, 2021. https://doi.org/10.1109/TPAMI.2020.3013269.

[8] Y. Li, X. Zhang, and D. Chen, “CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2018. https://doi.org/10.1109/CVPR.2018.00120.

[9] X. Cao, Z. Wang, Y. Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” in Proc. European Conf. Computer Vision (ECCV), 2018. https://doi.org/10.1007/978-3-030-01228-1_45.

[10] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers and distillation through attention,” in Proc. Int. Conf. Machine Learning (ICML), 2021. https://proceedings.mlr.press/v139/touvron21a/touvron21a.pdf.

[11] A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2008. https://doi.org/10.1109/CVPR.2008.4587569.

[12] A. B. Chan and N. Vasconcelos, “Counting people with low-level features and Bayesian regression,” IEEE Trans. Image Processing, vol. 21, no. 4, pp. 2160–2177, 2012. https://doi.org/10.1109/TIP.2011.2172800.

[13] H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2013. https://doi.org/10.1109/CVPR.2013.329.

[14] C. Zhang, H. Li, X. Wang, and X. Yang, “Cross-scene crowd counting via deep convolutional neural networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015. https://doi.org/10.1109/CVPR.2015.7298684.

[15] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016. https://doi.org/10.1109/CVPR.2016.70.

[16] C. Shang, H. Ai, and B. Bai, “End-to-end crowd counting via joint learning local and global count,” in Proc. IEEE Int. Conf. Image Processing (ICIP), 2016. https://doi.org/10.1109/ICIP.2016.7532551.

[17] D. Onoro-Rubio and R. J. López-Sastre, “Towards perspective-free object counting with deep learning,” in Proc. European Conf. Computer Vision (ECCV), 2016. https://doi.org/10.1007/978-3-319-46478-7_38.

[18] E. Walach and L. Wolf, “Learning to count with CNN boosting,” in Proc. European Conf. Computer Vision (ECCV), 2016. https://doi.org/10.1007/978-3-319-46475-6_41.

[19] W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019. https://doi.org/10.1109/CVPR.2019.00524.

[20] Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2019. https://doi.org/10.1109/ICCV.2019.00624.

[21] X. Jiang, L. Zhang, M. Xu, T. Zhang, P. Lv, B. Zhou, X. Yang, and Y. Pang, “Crowd counting and density estimation by trellis encoder-decoder networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019. https://doi.org/10.1109/CVPR.2019.00629.

[22] L. Zhu, Z. Zhao, C. Lu, Y. Lin, Y. Peng, and T. Yao, “Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting,” arXiv preprint arXiv:1902.01115, 2019. https://doi.org/10.48550/arXiv.1902.01115.

[23] H. Idrees, M. Tayyab, K. Athar, M. S. Naqvi, S. R. Ali, A. Haq, M. Ullah, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in Proc. European Conf. Computer Vision (ECCV), 2018. https://doi.org/10.1007/978-3-030-01216-8_33.

[24] H. Yao, K. Han, W. Wan, L. Hou, “Deep Spatial Regression Model for Image Crowd Counting,” arXiv preprint arXiv:1710.09757, 2018. https://doi.org/10.48550/arXiv.1710.09757.

[25] M. Shi, Z. Yang, C. Xu, and Q. Chen, “Revisiting perspective information for efficient crowd counting,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019. https://doi.org/10.1109/CVPR.2019.00745.

[26] V. A. Sindagi and V. M. Patel, “Generating high-quality crowd density maps using contextual pyramid CNNs,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2017. https://doi.org/10.1109/ICCV.2017.206.

[27] D. B. Sam, S. Surya, M. Sundararaman, A. Kamath, and R. V. Babu, “Locate, size and count: Accurately resolving people in dense crowds,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 43, no. 8, pp. 2739 – 2751, 2021. https://doi.org/10.1109/TPAMI.2020.2974830.

[28] D. B. Sam, S. Surya, and R. V. Babu, “Switching convolutional neural network for crowd counting,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017. https://doi.org/10.1109/CVPR.2017.429.

[29] B. Liu, E. Adeli, Z. Cao, T. Yu, and J. Li, “Leveraging unlabeled data for crowd counting by learning to rank,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2018. https://doi.org/10.1109/CVPR.2018.00799.

[30] B. Wang, H. Liu, D. Samaras, M. Hoai, “Distribution Matching for Crowd Counting,” in Advances in Neural Information Processing Systems (NeurIPS), 2020. https://proceedings.neurips.cc/paper/2020/hash/118bd558033a1016fcc82560c65cca5f-Abstract.html.

[31] Z. Ma, X. Wei, X. Hong, H. Lin, Y. Qiu, and Y. Gong, “Learning to count via unbalanced optimal transport,” in Proc. AAAI Conf. Artificial Intelligence (AAAI), vol. 35, no. 5, 2021. https://doi.org/10.1609/aaai.v35i3.16332.

[32] W. Shu, J. Wan, K. C. Tan, S. Phoummixay, Y. Ye, and A. B. Chan, “Crowd counting in the frequency domain,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2022. https://doi.org/10.1109/CVPR52688.2022.01900.

[33] J. Wan and A. B. Chan, “Modeling noisy annotations for crowd counting,” in Advances in Neural Information Processing Systems (NeurIPS), 2020. https://dl.acm.org/doi/abs/10.5555/3495724.3496009.

[34] Y. Meng, H. Zhang, Y. Zhao, X. Yang, X. Qian, X. Huang, and Y. Zheng, “Spatial uncertainty-aware semi-supervised crowd counting,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021. https://doi.org/10.1109/ICCV48922.2021.01526.

[35] C. Li, X. Hu, S. Abousamra, and C. Chen, “Calibrating uncertainty for semi-supervised crowd counting,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2023. https://doi.org/10.1109/ICCV51070.2023.01534.

[36] W. Lin, C. Zhao, and A. B. Chan, “Point-to-Region loss for semi-supervised point-based crowd counting,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2025. https://doi.org/10.1109/CVPR52734.2025.02734.

[37] H. Lin, Z. Ma, X. Hong, Y. Qiu, Y. Wang, and Y. Gong, “Gramformer: Learning crowd counting via graph-modulated transformer,” in Proc. AAAI Conf. Artificial Intelligence (AAAI), 2024. https://doi.org/10.1609/aaai.v38i4.28126.

[38] H. Mo, Y. Hu, X. Liu, B. Zhang, J. Han, X. Cao, and D. Doermann, “CountFormer: Multi-View Crowd Counting Transformer,” in European conference on computer vision, 2024. https://doi.org/10.1007/978-3-031-72943-0_2.

[39] D. Liang, W. Xu, and X. Bai, “An end-to-end transformer model for crowd localization,” in Proc. European Conf. Computer Vision, 2022. https://doi.org/10.1007/978-3-031-19769-7_3.

[40] Y.-K. Hsieh, J.-W. Hsieh, Y.-C. Tseng, M.-C. Chang, L. Xin, “Scale-Aware Crowd Count Network with Annotation Error Correction,” arXiv preprint arXiv:2312.16771, 2023. https://doi.org/10.48550/arXiv.2312.16771.

[41] J. Wan, Z. Liu, and A. B. Chan, “A generalized loss function for crowd counting and localization,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021. https://doi.org/10.1109/CVPR46437.2021.00201.

[42] Z. Ma, X. Hong, X. Wei, Y. Qiu, and Y. Gong, “Towards a universal model for cross-dataset crowd counting,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2021. https://doi.org/10.1109/ICCV48922.2021.00319.

[43] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems, 2017. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

[44] A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2021. https://doi.org/10.48550/arXiv.2010.11929.

[45] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2021. https://doi.org/10.1109/ICCV48922.2021.00986.

[46] Q. Wang, J. Gao, W. Lin, and Y. Yuan, “Learning from synthetic data for crowd counting in the wild,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2019. https://doi.org/10.1109/CVPR.2019.00839.

[47] V. A. Sindagi, R. Yasarla, and V. M. Patel, “JHU-CROWD++: Large-scale crowd counting dataset and benchmark,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 44, no. 5, 2022. https://doi.org/10.1109/TPAMI.2020.3035969.

[48] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2015. https://doi.org/10.48550/arXiv.1409.1556.

[49] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016. https://doi.org/10.1109/CVPR.2016.90.

[50] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2015. https://doi.org/10.48550/arXiv.1412.6980.

[51] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009. https://doi.org/10.1109/CVPR.2009.5206848.

[52] M. Raza, M. Ling, A. Ur Rahman, P. Pallewatta, A. A. Hersi, S. M. Beruwalage, and D. S. Kannangara, “LSKD: Lightweight Self-Knowledge Distillation Framework for Fast and Robust Crowd Counting,” Scientific Journal of Engineering Research, vol. 2, no. 2, pp. 179–196, 2026. https://doi.org/10.64539/sjer.v2i2.2026.436.

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Most read articles by the same author(s)

Cover of the Journal

Quick Access

Editorial Policies

Journal metrics

Information

Template

Flag Counter

Indexing and Abstract

Tools