Machine learning–based premium estimation with real-world risk segmentation
Healthcare Analytics & Predictive Modeling
Developed a machine learning system to accurately predict health insurance premiums based on customer demographics, lifestyle factors, and medical history. The solution addresses non-linear relationships that traditional pricing methods fail to capture.
The project follows an end-to-end ML pipeline including data cleaning, feature engineering, model training, error analysis, and deployment via a Streamlit web application for real-time predictions.
Multiple models including Linear Regression, Ridge Regression, and XGBoost were trained and compared. Initial error analysis revealed high deviation in younger age groups, leading to segmentation-based retraining.
After introducing a genetic risk feature, extreme prediction errors were reduced from 73% to just 2%, resulting in a reliable and explainable production-ready solution.