Transforming Healthcare Reporting: Using Learning-Based Encoding-Decoding and Attention Mechanism For AI-Powered Radiology Report Generation
Keywords:
Medical Report Generation, LSTM, VGG16, Natural Language Processing, Pretrained Model.Abstract
Manual generation of medical imaging reports is usually tedious and prone to errors, particularly for junior physicians, and continues to be a clinical workflow bottleneck. Although previous approaches based on LSTM or BERT-based models have made advancements, they usually fail to integrate visual and textual modalities in an effective way, which hinders report accuracy. This paper introduces a new multi-modal deep learning architecture for end-to-end automated radiology report generation from X-ray images to fill the gaps. Our method integrates an encoder-decoder model with a co-attention mechanism to learn visual features and semantic context simultaneously. We investigate three settings: (1) ChexNet121-based visual encoding with LSTM decoding (BLEU: 0.268), (2) ChexNet121 with attention-augmented decoding (BLEU: 0.185), and (3) our proposed VGG16-based encoder with contextual word embedding (BLEU: 0.844). The significant BLEU improvement demonstrates the key role of stable visual feature extraction in report quality. The originality of this work is in combining VGG16 as a feature extractor, which has been underutilized in recent research, and a co-attention mechanism for efficient long-distance semantic generation. In contrast to previous research that mainly addresses language model enhancement, our approach improves image-text alignment, a critical necessity for clinical applicability. This system shows great promise for real-world application, providing improved accuracy, clinical usefulness, and scalability. It offers a useful tool to help radiologists produce accurate, consistent, and interpretable reports from medical images.



