FANG Yihao, SHEN Lei, LAN Leibin, HUANG Anxiang
The detection of cattle targets in complex environments is a key issue in precise counting of cattle numbers based on machine vision. Due to factors such as occlusion caused by overcrowding of cattle and incomplete cattle individuals caused by their position at the edge of the camera, existing cattle target detection methods are not suitable for complex environments in breeding farms. This paper proposes a cattle detection algorithm in complex environments based on Swin-Transformer(SWT)-YOLOV5s network: first, we propose a backbone network SWT Backbone for cattle feature extraction, which is cascaded by double-layer shortcut(SC)-SWT and multi-layer convolution. Using the characteristics of SC-SWT module that focuses on global features, combined with the residual multi-layer convolution module that focuses on local features, we increase the depth and receptive field of the network, so that the model can fully extract both global and local features of cattle. Then, the feature fusion target detection head SWT Head, which is cascaded by not shortcut(NSC)-SWT and pyramid network, is proposed. Through the pyramid network, the feature pyramid of multi-scale fusion of global features and local features extracted from the SWT Backbone backbone network is constructed. Combined with the global receptive field of the NSC-SWT module Transformer and the CNN local receptive field of the C3 module, it enables the model to more accurately detect and filter the global features of cattle at high semantic levels and local features that represent the details of cattle in the feature pyramid, while efficiently removes feature interference from the background environment, and improves the detection accuracy of cattle targets in complex environments. Simulation experiments were conducted on the COWYCTC-1480 dataset collected in the laboratory. Compared with the widely used YOLOV5s and SSD algorithms, the accuracy, recall, and mAP of our method on the test set were 7.0%, 2.0%, and 11.1% points higher than the YOLOV5s algorithm,20.0%,8.0%,32.0%, and 29.3 points higher than the SSD algorithm, respectively.