HUANG Anxiang, SHEN Lei, LAN Leibin, FANG Yihao
Cattle target detection is a prerequisite for individual registration and recognition of cattle based on deep learning. The differences in lighting, color and breed in different actual scenarios make low-level features of cattle images diverse, while semantic information in high-level features cannot fully match the diverse low-level features, resulting in poor detection accuracy. In order to solve the problem of insufficient high-level feature semantics of the detection model, this paper designs a new cattle feature extraction backbone network ResMO Backbone and feature fusion network Dense Neck, and proposes a cattle detection algorithm based on ResMO Sense YOLO. In the backbone network, the ResMO module (ResBlock MHSA ODConv) is used to focus on the characteristics of cattle high-level features at multi-semantic level to enrich semantic information, and the SPPF structure and multi-layer convolution structure are combined to expand the receptive field, so that the model can better extract cattle high-level features; then, a feature pyramid based on DenseBlock and a feature fusion network cascaded with a path aggregation network based on DenseBlock are proposed, which utilize the feature reuse feature of DenseBlock and combine the multi-scale fusion feature of the feature pyramid and path aggregation network to further integrate the low-level feature position information and high-level feature semantic information of cattle, improving model detection accuracy. Compared with the FLYOLOv3, SSD and YOLOv5s, the model in this paper shows an average accuracy improvement of 40.1%, 30.3%, and 4.0% in the data sets of cow channels, cow sheds, and beef sheds collected in the laboratory. The recall rate increased by 34.9%, 23.1%, and 6.8%, respectively, and the mAP increased by 49.2%, 35.3%, and 5.0%.