SHAP-Weighted Hybrid Ensemble for Interpretable Stroke Risk Prediction Using Tabular Clinical Data
Abstract
Abstract Views: 0
Stroke is still among the main reasons that death and long-term disability occur all over the world. Therefore, early and reliable prediction is necessary to allow a timely intervention. To address this problem, a SHAP-weighted hybrid ensemble framework is proposed in this paper. The framework combines some powerful machine learning algorithms (XGBoost, Random Forest, and TabNet) to yield human-understandable and effective stroke risk prediction based on tabular clinical data. The idea behind the presented model is to use SHapley Additive explanations (SHAP) as a source of feature-based weighting that is not static but adaptive, thus creating a perfect symbiosis of the model’s predictive power and its interpretability. Full-scale preprocessing of the clinical data with KNN imputation, categorical encoding, SMOTETomek resampling for class balance, etc. realized data quality required for subsequent modeling. The Benchmark experiments performed on the Brain Stroke dataset (a public dataset with n=4981) demonstrate that the hybrid ensemble is superior to each of the base learners, with the highest reported metrics of ROC-AUC, F1, and recall being 0.8099, 0.2165, and 0.42, respectively. Hence, it is indicated that sensitivity and precision were quite balanced in the research. Moreover, the employment of SHAP as a method for forecast considerably helps in the comprehension of the model’s internal structure. Just to give you an example, as a result of the analysis, the model’s output was found to be most impacted by age, average glucose level, and BMI, and these three not only go a step further but are also supported by clinical findings. The proposed framework moves the needle on the front of interpretable AI by reconciling the performance of black-box models with the transparency requirements of clinical decision-support systems. In other words, the SHAP-weighted ensemble is, therefore, the stroke risk estimation tool that is open to the future where AI is used in medical settings for the good of the patient and with minimized chances of unintended consequences.