Explainable Machine Learning Framework for Early Heart Disease Detection Using SMOTE and SHAP

Authors

  • Mahfuz Islam Khan Jabed, Muhammad Imran, Abdul Ali Khan, Mohiuddin Mehedi, Ashraful Islam, Rahat Pervez Author

DOI:

https://doi.org/10.64149/

Keywords:

Heart Disease Prediction, Machine Learning, Explainable AI, SHAP, SMOTE, Classification Models, Healthcare Analytics, Cross-Validation, Hyperparameter Tuning, Clinical Decision Support.

Abstract

Cardiovascular diseases (CVDs) remain the leading cause of death globally, creating a persistent need for early screening tools that are both accurate and clinically interpretable.1 This study presents an end-to-end, explainable machine learning framework for binary heart disease prediction using a structured clinical dataset derived from the UCI Heart Disease benchmark, where published experiments commonly use a 14-variable subset and focus on distinguishing presence versus absence of disease.2 To strengthen reliability and reduce bias, the pipeline integrates stratified train–test splitting, feature scaling for scale-sensitive learners, and Synthetic Minority Over-sampling Technique (SMOTE) to address potential class imbalance in the training split.3 Multiple models are compared, including Logistic Regression, Naïve Bayes, KNN, SVM, Decision Tree, Random Forest, Gradient Boosting, AdaBoost, Extra Trees, XGBoost, and MLP. Performance is evaluated using accuracy, precision, recall, F1-score, and ROC-AUC, with 5-fold cross-validation to estimate generalization stability. On the held-out test set, Extra Trees achieved the highest ROC-AUC (90.80%), while SVM obtained the highest accuracy (83.61%). Cross-validation ranked Random Forest (mean ROC-AUC ≈ 90.06%) and AdaBoost (≈ 89.95%) as top performers, and GridSearchCV further optimized Extra Trees to a best cross-validated ROC-AUC of 0.912. Finally, explainability is provided through SHAP, which attributes predictions to clinically meaningful features, supporting transparent decision support rather than black-box output.4

Downloads

Published

2026-05-06

How to Cite

Explainable Machine Learning Framework for Early Heart Disease Detection Using SMOTE and SHAP. (2026). Vascular and Endovascular Review, 9(1), 316-324. https://doi.org/10.64149/