Fig. 3 Recognition module network structure in this study Focus is the slicing operation module for the input image; C3 is the concentrated comprehensive convolution block; CBS is the convolution unit; Concat indicates feature stitching; Upsample is feature upsampling; Detect is the detection module; Dark is DarkNet, and Dark+CA is the CA module added after the Dark module.
Fig. 4 The structure of CA module C denotes the depth of the feature channel; H and W denote the height and width of the feature; r is the indentation ratio; X Avg Pool denotes horizontal global pooling; Y Avg Pool denotes vertical global pooling; Conv2d denotes convolutional 2D; BN denotes batch normalization; Sigmoid is an activation function
Fig. 5 Prediction box and ground truth box of the model $b_{c_{x}}^{g t}$, $b_{c_{y}}^{g t}$ are the coordinates of the center point Bgt of the real box (ground truth box), and $b_{c_{x}}$, $b_{c_{y}}$ are the coordinates of the center point B of the prediction box; h and w are the length and width of the prediction box, and hgt, wgt are the length and width of the real box; σ is the straight line distance between the point Bgt and the point B; ch and cw are the distance between the center point of the two boxes in the vertical direction and the distance in the horizontal direction; ch and cw are the height and width of the smallest outer rectangle of the real box and the prediction box; α is the angle between the center point of the real box and the center point of the prediction box in the horizontal direction.
Fig. 10 Field cotton category recognition results based on YOLOX-Cotton Cotton_Side represents the side cotton identified based on YOLOX-Cotton model; Cotton_Top represents the positive normal cotton; CottonLow_Top represents the positive side of low-grade cotton. The number at the end of the cotton grade represents confidence.