MRW-ViT: Spatial-Frequency Domain Fusion and Optimal Metric for Few-Shot Medical Image Classification

<p>Ying Wu<sup>1</sup>, Jin Lu<sup>1</sup></p>

doi:10.25236/AJCIS.2025.080705

Academic Journal of Computing & Information Science, 2025, 8(7); doi: 10.25236/AJCIS.2025.080705.

MRW-ViT: Spatial-Frequency Domain Fusion and Optimal Metric for Few-Shot Medical Image Classification

Author(s)

Ying Wu¹, Jin Lu¹

Corresponding Author:

Jin Lu

Affiliation(s)

¹College of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an, China

Download PDF
|
Download: 23
|
View: 12971

Abstract

To address the challenges of data scarcity and inadequate lesion representation in medical image classification, this paper proposes a novel few-shot learning approach integrating spatial and frequency domains, termed Multi-Resolution Wavelet Enhanced Vision Transformer (MRW-ViT). The method utilizes two-dimensional discrete wavelet transform (2D-DWT) to decompose medical images, extracting high-frequency features to enhance lesion detail capture. A self-attention mechanism is employed to dynamically integrate global context with local pathological information, improving feature representation completeness. A cross-domain feature fusion module is designed to combine multi-scale features from both spatial and frequency domains, strengthening pathological representation. Furthermore, Earth Mover’s Distance (EMD) is introduced to measure subtle inter-class differences, optimizing classification decisions. Experiments were conducted on the MedMNIST dataset, encompassing six classification tasks including PathMNIST, DermaMNIST, and OCTMNIST. Results demonstrate that MRW-ViT achieves an area under the curve (AUC) of 0.990 in colon pathology classification and an AUC of 0.995 in pneumonia detection, outperforming state-of-the-art methods. In breast ultrasound diagnosis with a limited sample size of 780 images, the AUC reaches 0.948. Ablation studies confirm the effectiveness of each module.

Keywords

Few-Shot Learning; Medical Image Classification; Earth Mover’s Distance; MedMNIST

Cite This Paper

Ying Wu, Jin Lu. MRW-ViT: Spatial-Frequency Domain Fusion and Optimal Metric for Few-Shot Medical Image Classification. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 7: 33-46. https://doi.org/10.25236/AJCIS.2025.080705.

References

[1] LIU J, ZHU M Y, CHEN F, et al. Research on precise diagnosis and treatment in neurosurgery based on intelligent medical image analysis technology[J]. Chinese Medical Equipment Journal, 2018, 39(2): 1-6,28.

[2] YANG J, SHI R, WEI D, et al. MedMNIST v2: A large-scale lightweight benchmark for 2D and 3D biomedical image classification[J]. Scientific Data, 2023, 10(1): 41.

[3] WANG Y L, ZHANG S L, LI C J, et al. Text classification method based on TF-IDF and cosine similarity[J]. Journal of Chinese Information Processing, 2017, 31(5): 138-145.

[4] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 4077-4087.

[5] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017. PMLR, 2017: 1126-1135.

[6] WANG Y, YAO Q, KWOK J T, et al. Generalizing from a few examples: A survey on few-shot learning[J]. ACM Computing Surveys, 2020, 53(3): 1-34.

[7] CHEN W Y, LIU Y C, KIRA Z, et al. A closer look at few-shot classification[C]//7th International Conference on Learning Representations, New Orleans, May 6-9, 2019. ICLR, 2019.

[8] PAN Z, HU G, ZHU Z, et al. Predicting invasiveness of lung adenocarcinoma at chest CT with deep learning ternary classification models[J]. Radiology, 2024, 311(1): e232057.

[9] PLL, VADDI R, ELISH M O, et al. CSDNet: A novel deep learning framework for improved cataract state detection[J]. Diagnostics, 2024, 14(10): 983.

[10] RASHEED Z, MA Y K, ULLAH I, et al. Automated classification of brain tumors from magnetic resonance imaging using deep learning[J]. Brain Sciences, 2023, 13(4): 602.

[11] RAFIQ A, CHURSIN A, AWAD ALREFAE W, et al. Detection and classification of histopathological breast images using a fusion of CNN frameworks[J]. Diagnostics, 2023, 13(10): 1700.

[12] AN F, LI X, MA X. Medical image classification algorithm based on visual attention mechanism-MCNN[J]. Oxidative Medicine and Cellular Longevity, 2021, 2021: 6280690.

[13] YANG Y, et al. DiffMIC: Dual-guidance diffusion network for medical image classification[C]//26th International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, Oct 8-12, 2023. Cham: Springer, 2023: 221-230.

[14] MANZARI O N, AHMADABADI H, KASHIANI H, et al. MedViT: A robust vision transformer for generalized medical image classification[J]. Computers in Biology and Medicine, 2023, 157: 106791.

[15] SUNG F, YANG Y, ZHANG L, et al. Learning to compare: Relation network for few-shot learning[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-22, 2018. IEEE, 2018: 1199-1208.

[16] LEPIK Ü, HEIN H. Haar wavelets[M]//Haar wavelets: with applications. Cham: Springer, 2014: 7-20.

[17] VONESCH C, BLU T, UNSER M. Generalized Daubechies wavelet families[J]. IEEE Transactions on Signal Processing, 2007, 55(9): 4415-4429.

[18] RIAZ F, HASSAN A, REHMAN S, et al. EMD-based temporal and spectral features for the classification of EEG signals using supervised learning[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2015, 24(1): 28-35.

[19] DU R S, ZHANG Y N, MENG L D, et al. Few-shot mineral image classification based on EMD distance metric[J]. Journal of Zhengzhou University (Natural Science Edition), 2023, 55(6): 63-70.

[20] DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, Jun 20-25, 2009. IEEE, 2009: 248-255.

[21] WEN L, LI X, GAO L. A transfer convolutional neural network for fault diagnosis based on ResNet-50[J]. Neural Computing and Applications, 2020, 32(10): 6111-6124.

[22] AYYACHAMY S, ALEX V, KHENED M, et al. Medical image retrieval using ResNet-18[C]//Medical Imaging 2019: Imaging Informatics for Healthcare, San Diego, Feb 17-18, 2019. SPIE, 2019, 10954: 233-241.

[23] FEURER M, EGGENSPERGER K, FALKNER S, et al. Auto-sklearn 2.0: Hands-free automl via meta-learning[J]. Journal of Machine Learning Research, 2022, 23(261): 1-61.

[24] JIN H, SONG Q, HU X. Auto-keras: An efficient neural architecture search system[C]//25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, Aug 4-8, 2019. ACM, 2019: 1946-1956.

[25] ERICKSON N, MUELLER J, SHIRKOV A, et al. AutoGluon-tabular: Robust and accurate automl for structured data[J]. arXiv preprint, 2020. arXiv:2003.06505.

[26] LIU H, SIMONYAN K, YANG Y. DARTS: Differentiable architecture search[C]// Proceedings of the International Conference on Learning Representations, 2019.

[27] XIE S, ZHENG H, LIU C, LIN L. SNAS: Stochastic neural architecture search[C]// Proceedings of the International Conference on Learning Representations, 2019.

[28] ZHANG J, LI D, WANG L, ZHANG L. One-shot neural architecture search by dynamically pruning supernet in hierarchical order[J]. International Journal of Neural Systems, 2021, 31(7): 2150029.

[29] LU Z, WHALEN I, BODDETI V, DHEBAR Y, DEB K, GOODMAN E, BANZHAF W. NSGA-Net: Neural architecture search using multi-objective genetic algorithm[C]// Proceedings of the Genetic and Evolutionary Computation Conference. 2019: 419-427.

[30] WANG Y, et al. MedNAS: Multiscale Training-Free Neural Architecture Search for Medical Image Analysis[J]. IEEE Transactions on Evolutionary Computation, 2024, 28(3): 668-681.

[31] Burt R W, Barthel J S, Dunn K B, et al. NCCN clinical practice guidelines in oncology. Colorectal cancer screening[J]. Journal of the National Comprehensive Cancer Network: JNCCN, 2010, 8(1): 8-61.