A Transparent and Reproducible Machine Learning Workflow for Chronic Kidney DiseasePrediction Using Clinical Features and SHAP-Based Interpretation
Keywords:
Chronic Kidney Disease, Machine Learning, Feature Selection, Random Forest, Predictive Modelling, Explanations.Abstract
This study explores the use of machine learning for accurate and early diagnosis of Chronic Kidney Disease (CKD) using clinical and laboratory features. Early detection is essential for effective treatment and improved patient outcomes. The methodology ensures data quality through iterative imputation and selects the top ten predictive features using Recursive Feature Elimination (RFE) with a Random Forest classifier. Several modelsincluding Logistic Regression, SVM, Gradient Boosting, Naive Bayes, Neural Networks, and Random Forestwere compared. The Random Forest model, optimized via randomized hyperparameter tuning, achieved near-perfect performance across accuracy, precision, recall, and F1-score on both training and test sets. Its robustness was confirmed using ROC curves, confusion matrices, Partial Dependence Plots, and SHAP (SHapley Additive exPlanations), which offered clear, intuitive insights into feature contributions. Finally, the model was deployed through a user-friendly Streamlit web application for real-time CKD risk prediction based on patient input, ensuring practical clinical usability.



