Parameter-Efficient Fine-Tuning for Sonar Shipwreck Segmentation: A Seed Averaged Study with SegFormer and LoRA

Shehan Maxwell Beruwalage; Chunyong Yin; Muhammad Raza; Deshan Sachintha Kannangara; Sachini Amani Hendavitharana

doi:10.64539/sjer.v2i2.2026.454

Authors

Shehan Maxwell Beruwalage Nanjing University of Information Science and Technology, China
Chunyong Yin Nanjing University of Information Science and Technology, China
Muhammad Raza Nanjing University of Information Science and Technology, China
Deshan Sachintha Kannangara Nanjing University of Information Science and Technology, China
Sachini Amani Hendavitharana Nanjing University of Information Science and Technology, China

DOI:

https://doi.org/10.64539/sjer.v2i2.2026.454

Keywords:

Parameter-efficient fine-tuning, Sonar shipwreck segmentation, SegFormer-B0, LoRA, Model efficiency, Dice, IoU, Training efficiency, Segmentation accuracy

Abstract

Accurate segmentation of shipwreck targets in sonar imagery is important for underwater archaeology, marine monitoring, and search operations, but the task remains difficult because labeled sonar masks are scarce and full adaptation of transformer models can be computationally expensive. This study evaluates whether parameter-efficient fine-tuning can provide a practical alternative for binary sonar shipwreck segmentation. Using SegFormer-B0 initialized from a pretrained checkpoint, three adaptation strategies were compared under a consistent protocol: full fine-tuning of all model parameters (FullFT), training only the segmentation head (Head-only), and LoRA-based adaptation of selected linear layers together with head training (LoRA-A+Head). Models were selected by the best validation epoch and evaluated on a held-out test set. Across three random seeds, FullFT achieved the best performance, with a Dice score of 0.614 ± 0.008 and IoU of 0.487 ± 0.007. LoRA-A+Head achieved a Dice score of 0.546 ± 0.010 and IoU of 0.401 ± 0.008 while updating only 1.57% of the parameters, whereas Head-only reached 0.494 ± 0.010 Dice and 0.354 ± 0.008 IoU. These results show a clear accuracy efficiency trade off, full fine-tuning gives the highest accuracy, whereas LoRA-A+Head offers a practical option when reducing the number of updated parameters is important. The findings support the use of parameter-efficient adaptation for sonar segmentation in compute-limited settings.

References

[1] D. R. Blidberg, “The development of autonomous underwater vehicles,” Ocean Engineering, 2001. https://www.researchgate.net/publication/247835516.

[2] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015. https://doi.org/10.1007/978-3-319-24574-4_28.

[3] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.

https://doi.org/10.1038/nature14539.

[4] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted windows,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021. https://doi.org/10.1109/ICCV48922.2021.00986.

[5] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” arXiv preprint arXiv:2105.15203, 2021. https://doi.org/10.48550/arXiv.2105.15203.

[6] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. https://synapse.koreamed.org/pdf/10.4258/hir.2016.22.4.351.

[7] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully convolutional neural networks for volumetric medical image segmentation,” in Fourth International Conference on 3D Vision (3DV), 2016. https://doi.org/10.1109/3DV.2016.79.

[8] S. Kaplan, J. McCandlish, T. Henighan, T. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv preprint arXiv:2001.08361, 2020. https://doi.org/10.48550/arXiv.2001.08361.

[9] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2022. https://doi.org/10.48550/arXiv.2106.09685.

[10] T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” arXiv preprint arXiv:2305.14314, 2023. https://doi.org/10.48550/arXiv.2305.14314.

[11] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” arXiv preprint arXiv:1902.00751, 2019. https://doi.org/10.48550/arXiv.1902.00751.

[12] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2021. https://doi.org/10.48550/arXiv.2010.11929.

[13] J. Lei, H. Wang, L. Fan, Q. Gu, S. Rong, and H. Zhang, “SonarNet: Global feature based hybrid attention network for side scan sonar image segmentation,” Remote Sensing, vol. 17, no. 14, 2450, 2025. https://doi.org/10.3390/rs17142450.

[14] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. https://doi.org/10.1109/CVPR.2015.7298965.

[15] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017.

https://doi.org/10.1145/3065386.

[16] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. Le, and A. Ng, “Large scale distributed deep networks,” in Advances in Neural Information Processing Systems, 2012. https://proceedings.neurips.cc/paper_files/paper/2012/hash/6aca97005c68f1206823815f66102863-Abstract.html.

[17] J. Pfeiffer, A. Kamath, A. Rücklé, K. Cho, I. Gurevych, “AdapterFusion: Non-destructive task composition for transfer learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 487–503, 2021.

https://doi.org/10.18653/v1/2021.eacl-main.39.

[18] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM, vol. 58, no. 3, 2011. https://doi.org/10.1145/1970392.1970395.

[19] A. Shorten and T. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” Journal of Big Data, vol. 6, 2019. https://doi.org/10.1186/s40537-019-0197-0.

[20] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. https://doi.org/10.1109/CVPR.2017.106.

[21] A. V. Sethuraman, A. Sheppard, O. Bagoren, C. Pinnow, J. Anderson, T. C. Havens, and K. A. Skinner, “Machine learning for shipwreck segmentation from side scan sonar imagery: Dataset and benchmark,” The International Journal of Robotics Research, vol. 44, no. 3, pp. 341–354, 2025. https://umfieldrobotics.github.io/ai4shipwrecks/.

[22] P. Zeng, Y. Chen, W. Zhang, X. Zhang, and Y. Chen, “Multi beam sonar target segmentation based on BS UNet,” Electronics, vol. 13, no. 14, 2841, 2024. https://doi.org/10.3390/electronics13142841.

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Most read articles by the same author(s)

Cover of the Journal

Quick Access

Editorial Policies

Journal metrics

Information

Template

Flag Counter

Indexing and Abstract

Tools