HS-VidNet: An Efficient Video Denoising Network for Autonomous Driving Based on Frequency-Spatial Reconstruction Mechanism

<p>Pan Wang, Lei Ding</p>

doi:10.25236/AJCIS.2026.090304

Academic Journal of Computing & Information Science, 2026, 9(3); doi: 10.25236/AJCIS.2026.090304.

HS-VidNet: An Efficient Video Denoising Network for Autonomous Driving Based on Frequency-Spatial Reconstruction Mechanism

Author(s)

Pan Wang, Lei Ding

Corresponding Author:

Pan Wang

Affiliation(s)

School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an, Shaanxi, China

Download PDF
|
Download: 11
|
View: 360

Abstract

Visual perception is critical for Autonomous Driving Systems (ADS) under extreme weather conditions such as rain, fog, and low illumination. This paper proposes HS-VidNet, an efficient and lightweight video denoising network. The method integrates the Spatial and Channel Reconstruction Convolution (SCConv) module within a U-Net architecture for feature reconstruction. This module utilizes Spatial Reconstruction Units (SRU) and Channel Reconstruction Units (CRU) to reshape feature flows. It suppresses non-discriminative redundancy in regions like the sky and road surface while concentrating limited computational resources on critical semantic topologies, such as road edges and lane lines. This significantly reduces computational overhead. Furthermore, the HiLo attention mechanism is introduced to compensate for the loss of high-frequency details during denoising. The high-frequency branch extracts fine geometric textures within local windows. Concurrently, the low-frequency branch models global long-range dependencies through a down-sampling strategy. This enhances the preservation of critical structural information and maintains feature consistency. Experiments were conducted on the CARLA-AWC dataset using the CARLA simulator. Results demonstrate that HS-VidNet achieves a stable inference speed of 72 FPS with a computational cost of only 87.2 GFLOPs. Its efficiency outperforms existing SCUNet and SwinIR-Light algorithms. In terms of accuracy, the model achieves an SSIM of 0.912, effectively balancing environmental noise removal with the preservation of critical structures.

Keywords

Video Denoising; Autonomous Driving; Lightweight Network; Frequency Decoupling; SCConv; HiLo Attention

Cite This Paper

Pan Wang, Lei Ding. HS-VidNet: An Efficient Video Denoising Network for Autonomous Driving Based on Frequency-Spatial Reconstruction Mechanism. Academic Journal of Computing & Information Science (2026), Vol. 9, Issue 3: 30-39. https://doi.org/10.25236/AJCIS.2026.090304.

References

[1] Xu, C., & Sankar, R. (2024). A Comprehensive Review of Autonomous Driving Algorithms: Tackling Adverse Weather Conditions, Unpredictable Traffic Violations, Blind Spot Monitoring, and Emergency Maneuvers. Algorithms, 17(11), 526.

[2] Sheth, D. Y., Mohan, S., Vincent, J. L., et al. (2021). Unsupervised deep video denoising. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 1759-1768.

[3] Tassano, M., Delon, J., & Veit, T. (2020). FastDVDnet: Towards Real-Time Deep Video Denoising via Recurrence and Multi-Depth-Separable Convolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1765-1774.

[4] Chen, L., Chu, X., Zhang, X., & Sun, J. (2022). Simple Baselines for Image Restoration. Proceedings of the European Conference on Computer Vision (ECCV), 17–33.

[5] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234-241.

[6] Zhao, H., Gallo, O., Frosio, I., & Kautz, J. (2017). Loss functions for image restoration with neural networks. IEEE Transactions on Computational Imaging, 3(1), 47–57.

[7] Liang, J., Cao, J., Sun, G., et al. (2021). SwinIR: Image restoration using swin transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 1833-1844.

[8] Han, K., Wang, Y., Xu, Q., et al. (2020). GhostNet: More features from cheap operations. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Li, J., Wen, Y., He, L., et al. (2023). SCConv: Spatial and channel reconstruction convolution for feature redundancy. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6153-6162.

[10] Pan, Z., Zhuang, B., Liu, J., et al. (2022). Fast vision transformers with HiLo attention. Advances in Neural Information Processing Systems (NeurIPS), 35, 14541-14554.

[11] Chan, K. C., Wang, X., Yu, K., & Loy, C. C. (2022). BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5934-5943.

[12] Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An Open Urban Driving Simulator. Proceedings of the 1st Annual Conference on Robot Learning (CoRL), 1–16.

[13] Bianco, S., Cadene, R., Celona, L., & Napoletano, P. (2018). Benchmark Analysis of Representative Deep Neural Network Architectures. IEEE Access, 6, 64270-64277. doi: 10.1109/ACCESS.2018.2877890.

[14] Zhang, K., Li, Y., Liang, J., et al. (2023). Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis. Machine Intelligence Research, 20(6), 822–836.

[15] Yue, Z., Wang, J., & Loy, C. C. (2024). Efficient diffusion model for image restoration by residual shifting. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).