Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2025, 8(7); doi: 10.25236/AJCIS.2025.080705.

MRW-ViT: Spatial-Frequency Domain Fusion and Optimal Metric for Few-Shot Medical Image Classification

Author(s)

Ying Wu1, Jin Lu1

Corresponding Author:
Jin Lu
Affiliation(s)

1College of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an, China

Abstract

To address the challenges of data scarcity and inadequate lesion representation in medical image classification, this paper proposes a novel few-shot learning approach integrating spatial and frequency domains, termed Multi-Resolution Wavelet Enhanced Vision Transformer (MRW-ViT). The method utilizes two-dimensional discrete wavelet transform (2D-DWT) to decompose medical images, extracting high-frequency features to enhance lesion detail capture. A self-attention mechanism is employed to dynamically integrate global context with local pathological information, improving feature representation completeness. A cross-domain feature fusion module is designed to combine multi-scale features from both spatial and frequency domains, strengthening pathological representation. Furthermore, Earth Mover’s Distance (EMD) is introduced to measure subtle inter-class differences, optimizing classification decisions. Experiments were conducted on the MedMNIST dataset, encompassing six classification tasks including PathMNIST, DermaMNIST, and OCTMNIST. Results demonstrate that MRW-ViT achieves an area under the curve (AUC) of 0.990 in colon pathology classification and an AUC of 0.995 in pneumonia detection, outperforming state-of-the-art methods. In breast ultrasound diagnosis with a limited sample size of 780 images, the AUC reaches 0.948. Ablation studies confirm the effectiveness of each module.

Keywords

Few-Shot Learning; Medical Image Classification; Earth Mover’s Distance; MedMNIST

Cite This Paper

Ying Wu, Jin Lu. MRW-ViT: Spatial-Frequency Domain Fusion and Optimal Metric for Few-Shot Medical Image Classification. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 7: 33-46. https://doi.org/10.25236/AJCIS.2025.080705.

References

[1] LIU J, ZHU M Y, CHEN F, et al. Research on precise diagnosis and treatment in neurosurgery based on intelligent medical image analysis technology[J]. Chinese Medical Equipment Journal, 2018, 39(2): 1-6,28.

[2] YANG J, SHI R, WEI D, et al. MedMNIST v2: A large-scale lightweight benchmark for 2D and 3D biomedical image classification[J]. Scientific Data, 2023, 10(1): 41.

[3] WANG Y L, ZHANG S L, LI C J, et al. Text classification method based on TF-IDF and cosine similarity[J]. Journal of Chinese Information Processing, 2017, 31(5): 138-145.

[4] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 4077-4087.

[5] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017. PMLR, 2017: 1126-1135. 

[6] WANG Y, YAO Q, KWOK J T, et al. Generalizing from a few examples: A survey on few-shot learning[J]. ACM Computing Surveys, 2020, 53(3): 1-34.  

[7] CHEN W Y, LIU Y C, KIRA Z, et al. A closer look at few-shot classification[C]//7th International Conference on Learning Representations, New Orleans, May 6-9, 2019. ICLR, 2019. 

[8] PAN Z, HU G, ZHU Z, et al. Predicting invasiveness of lung adenocarcinoma at chest CT with deep learning ternary classification models[J]. Radiology, 2024, 311(1): e232057. 

[9] PLL, VADDI R, ELISH M O, et al. CSDNet: A novel deep learning framework for improved cataract state detection[J]. Diagnostics, 2024, 14(10): 983. 

[10] RASHEED Z, MA Y K, ULLAH I, et al. Automated classification of brain tumors from magnetic resonance imaging using deep learning[J]. Brain Sciences, 2023, 13(4): 602. 

[11] RAFIQ A, CHURSIN A, AWAD ALREFAE W, et al. Detection and classification of histopathological breast images using a fusion of CNN frameworks[J]. Diagnostics, 2023, 13(10): 1700.

[12] AN F, LI X, MA X. Medical image classification algorithm based on visual attention mechanism-MCNN[J]. Oxidative Medicine and Cellular Longevity, 2021, 2021: 6280690. 

[13] YANG Y, et al. DiffMIC: Dual-guidance diffusion network for medical image classification[C]//26th International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, Oct 8-12, 2023. Cham: Springer, 2023: 221-230. 

[14] MANZARI O N, AHMADABADI H, KASHIANI H, et al. MedViT: A robust vision transformer for generalized medical image classification[J]. Computers in Biology and Medicine, 2023, 157: 106791.

[15] SUNG F, YANG Y, ZHANG L, et al. Learning to compare: Relation network for few-shot learning[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. IEEE, 2018: 1199-1208. 

[16] LEPIK Ü, HEIN H. Haar wavelets[M]//Haar wavelets: with applications. Cham: Springer, 2014: 7-20. 

[17] VONESCH C, BLU T, UNSER M. Generalized Daubechies wavelet families[J]. IEEE Transactions on Signal Processing, 2007, 55(9): 4415-4429. 

[18] RIAZ F, HASSAN A, REHMAN S, et al. EMD-based temporal and spectral features for the classification of EEG signals using supervised learning[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2015, 24(1): 28-35. 

[19] DU R S, ZHANG Y N, MENG L D, et al. Few-shot mineral image classification based on EMD distance metric[J]. Journal of Zhengzhou University (Natural Science Edition), 2023, 55(6): 63-70.

[20] DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. IEEE, 2009: 248-255. 

[21] WEN L, LI X, GAO L. A transfer convolutional neural network for fault diagnosis based on ResNet-50[J]. Neural Computing and Applications, 2020, 32(10): 6111-6124. 

[22] AYYACHAMY S, ALEX V, KHENED M, et al. Medical image retrieval using ResNet-18[C]//Medical Imaging 2019: Imaging Informatics for Healthcare, San Diego, Feb 17-18, 2019. SPIE, 2019, 10954: 233-241. 

[23] FEURER M, EGGENSPERGER K, FALKNER S, et al. Auto-sklearn 2.0: Hands-free automl via meta-learning[J]. Journal of Machine Learning Research, 2022, 23(261): 1-61. 

[24] JIN H, SONG Q, HU X. Auto-keras: An efficient neural architecture search system[C]//25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, Aug 4-8, 2019. ACM, 2019: 1946-1956. 

[25] ERICKSON N, MUELLER J, SHIRKOV A, et al. AutoGluon-tabular: Robust and accurate automl for structured data[J]. arXiv preprint, 2020. arXiv:2003.06505.

[26] LIU H, SIMONYAN K, YANG Y. DARTS: Differentiable architecture search[C]// Proceedings of the International Conference on Learning Representations, 2019.

[27] XIE S, ZHENG H, LIU C, LIN L. SNAS: Stochastic neural architecture search[C]// Proceedings of the International Conference on Learning Representations, 2019.

[28] ZHANG J, LI D, WANG L, ZHANG L. One-shot neural architecture search by dynamically pruning supernet in hierarchical order[J]. International Journal of Neural Systems, 2021, 31(7): 2150029.

[29] LU Z, WHALEN I, BODDETI V, DHEBAR Y, DEB K, GOODMAN E, BANZHAF W. NSGA-Net: Neural architecture search using multi-objective genetic algorithm[C]// Proceedings of the Genetic and Evolutionary Computation Conference. 2019: 419-427.

[30] WANG Y, et al. MedNAS: Multiscale Training-Free Neural Architecture Search for Medical Image Analysis[J]. IEEE Transactions on Evolutionary Computation, 2024, 28(3): 668-681.

[31] Burt R W, Barthel J S, Dunn K B, et al. NCCN clinical practice guidelines in oncology. Colorectal cancer screening[J]. Journal of the National Comprehensive Cancer Network: JNCCN, 2010, 8(1): 8-61.