Lightweight Thermal Udder Segmentation via Structured Pruning for On-Device Farm Deployment

Fonisto, M.; Verde, M. T.; Bonavolonta, F.; Liccardo, A.; Matera, R.; Santinello, M.; Amato, F.

doi:10.1109/ACCESS.2026.3665389

Accurate and automatic extraction of the udder skin surface temperature (USST) from thermal imagery enables continuous, non-invasive monitoring of udder health. The literature indicates that USST correlates with somatic cell count (SCC), the primary marker of mastitis activity, provided that udder regions are precisely segmented. Building on our previous system deployed on-farm, which integrated thermal imaging with robotic milking for real-time monitoring, we focus this paper on the segmentation component to obtain a precise yet lightweight, farm-deployable model. We explore compact binary segmentation models that pair the newly released DINOv3-initialised encoders with lightweight decoders: ConvNeXt + Feature Pyramid Network (FPN) and ViT + Dense Prediction Transformer (DPT). Our training follows four stages: (i) train the decoder while keeping the encoder frozen; (ii) apply isomorphic pruning to remove redundant, shape-consistent channels and feature dimensions while preserving tensor interfaces; (iii) unfreeze the pruned encoder and fine-tune the whole network to recover performance; and (iv) apply post-training quantisation to produce FP16 variants. We evaluate encoders across scales and report test intersection over union (IoU), parameter count, and multiply-accumulate operations (MACs), selecting via the IoU-cost frontier at full precision. While a non-pruned ConvNeXttiny backbone attains the highest IoU, a pruned ConvNeXtsmall achieves comparable IoU at a lower cost and is therefore our Pareto-optimal choice for edge deployment. Such a model (ConvNeXtsmall+FPN) achieves an 81.68% IoU with 25.44M parameters and 32.60B MACs. It is exported to ONNX and quantised, and deployed over an Nvidia Jetson Orin Nano as a practical, offline, and privacy-preserving edge solution for farms. The deployed FP16-quantised ONNX model with CUDA-accelerated runtime achieves an average latency of 81.6 ms per frame at the 15W power profile on the Jetson Orin Nano. The dataset, source code, training logs, model weights, and Jetson-ready Docker image used for this study are released open-source to ensure full reproducibility and a fair comparison.

Lightweight Thermal Udder Segmentation via Structured Pruning for On-Device Farm Deployment / Fonisto, M.; Verde, M. T.; Bonavolonta, F.; Liccardo, A.; Matera, R.; Santinello, M.; Amato, F.. - In: IEEE ACCESS. - ISSN 2169-3536. - 14:(2026), pp. 25650-25662. [10.1109/ACCESS.2026.3665389]