Enhancing Image Recognition with Adaptive Interaction and Cross Attention in Convolutional Neural Network

<p>Liwen Kong<sup>1</sup>, Jianqiang Mei<sup>1</sup>, Fan Jia<sup>2</sup>, Weixiang Du<sup>3</sup></p>

doi:10.25236/AJCIS.2025.080901

Academic Journal of Computing & Information Science, 2025, 8(9); doi: 10.25236/AJCIS.2025.080901.

Enhancing Image Recognition with Adaptive Interaction and Cross Attention in Convolutional Neural Network

Author(s)

Liwen Kong¹, Jianqiang Mei¹, Fan Jia², Weixiang Du³

Corresponding Author:

Jianqiang Mei

Affiliation(s)

¹School of Electronic Engineering, Tianjin University of Technology and Education, Tianjin, China

²Raysov Instrument Co. Ltd., Dandong, China

³Gansu Province Special Equipment Inspection & Testing Research Institute, Lanzhou, Gansu, China

Download PDF
|
Download: 10
|
View: 632

Abstract

Convolutional Neural Network (CNN)-based classifiers have been extensively employed in image recognition tasks. However, as CNN networks continue to deepen, existing deep architectures often result in a large number of parameters and substantial model sizes. Although deep features often contain rich semantic information, the continuous deepening of the network leads to a loss of detailed target information due to resolution reduction, ultimately decreasing image recognition accuracy. To address this issue, we propose a convolutional classifier that incorporates adaptive interaction and cross attention mechanisms. In this study, we design a VGG-like network where the adaptive interaction module enhances the feature transformation process of traditional convolution. This module expands the receptive field of convolutional kernels, adaptively constructs spatial and channel relationship indicators, and outputs more discriminative feature representations. Additionally, the cross attention module effectively captures global contextual information, enabling the network to learn spatial dependency relationships among features. Our proposed method is compared with both classical and state-of-the-art classification models, and experimental results on the CIFAR-10 dataset demonstrate that our method achieves the highest accuracy of 88.97%. This advancement will contribute to the improved capture of advanced semantic features in the domain of image recognition and target measurement.

Keywords

Image Recognition, Classifier, Convolutional Neural Network, Adaptive Interaction, Cross Attention

Cite This Paper

Liwen Kong, Jianqiang Mei, Fan Jia, Weixiang Du. Enhancing Image Recognition with Adaptive Interaction and Cross Attention in Convolutional Neural Network. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 9: 1-8. https://doi.org/10.25236/AJCIS.2025.080901.

References

[1] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

[2] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

[3] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25.

[4] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks[C]//European conference on computer vision. Cham: Springer International Publishing, 2014: 818-833.

[5] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890.

[6] Sun Z, Sun H, Zhang M, et al. A Non-Local Block With Adaptive Regularization Strategy[J]. IEEE Signal Processing Letters, 2024, 31: 331-335.

[7] Xiao Z, Ye K, Cui G. Differential self-feedback dilated convolution network with dual-tree channel attention mechanism for hyperspectral image classification[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 73: 1-17.

[8] He J, Deng Z, Zhou L, et al. Adaptive pyramid context network for semantic segmentation[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 7519-7528.

[9] Guo M H, Lu C Z, Liu Z N, et al. Visual attention network[J]. Computational visual media, 2023, 9(4): 733-752.

[10] Peng C, Zhang X, Yu G, et al. Large kernel matters--improve semantic segmentation by global convolutional network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4353-4361.

[11] Azad R, Niggemeier L, Hüttemann M, et al. Beyond self-attention: Deformable large kernel attention for medical image segmentation[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2024: 1287-1297.

[12] Lau K W, Po L M, Rehman Y A U. Large separable kernel attention: Rethinking the large kernel attention design in cnn[J]. Expert Systems with Applications, 2024, 236: 121352.

[13] Su Z, Fang L, Kang W, et al. Dynamic group convolution for accelerating convolutional neural networks[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 138-155.

[14] Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 3146-3154.

[15] Wang K, Shariatmadar K, Manchingal S K, et al. Creinns: Credal-set interval neural networks for uncertainty estimation in classification tasks[J]. Neural Networks, 2025, 185: 107198.