Hybrid Ensemble Learning Model for Chronic Kidney Disease Prediction
Keywords:
Chronic Kidney Disease, Machine Learning, XGBoost, Random Forest, Ensemble Model, CKD Prediction, Classification, Early DiagnosisAbstract
Chronic Kidney Disease (CKD) is a progressive condition that can lead to end-stage renal failure if not detected early. Machine learning techniques have emerged as effective tools for early CKD prediction and diagnosis. In this article, we present a hybrid ensemble model of XGBoost and RF to predict CKD, and compare its performance with at baseline classifier( SVM). The models are tested on the widely used CKD dataset found in (UCI CKD dataset) composed of clinical and laboratory patient characteristics. Take XGBoost+RF for example, and it is supposed that the hybrid model wants to utilise the advantages of boosting method and bagging method. We describe the pre-processing of the dataset, feature processing, as well as how to use our hybrid model implementation which includes code and algorithm details. Experiments show appealing performance of the hybrid ensemble in prediction; it can yield better results than baseline SVM and constituent models according to accuracy, precision, recall, as well as F1-score. In our experiments, the hybrid model achieved an overall accuracy of about 99%, whereas the baseline SVM achieved around 94%. We further provide a comparison with other methods from literature, among which artificial neural networks and other ensembles. The above results indicate that the XGBoost+random forest hybrid model is highly precise and stable in predicting CKD. We demonstrate additional feature importance and model behaviour to show interpretations are accessible for clinical insights. This investigation reveals the potential of ensemble machine learning for better prediction of CKD and initiates the effort toward inclusion of these models in clinical-decision support systems towards early-stage diagnosis of CKD



