Health Insurance Premium Predictor

Machine learning–based premium estimation with real-world risk segmentation

Domain / Function

Healthcare Analytics & Predictive Modeling

Project Overview

Developed a machine learning system to accurately predict health insurance premiums based on customer demographics, lifestyle factors, and medical history. The solution addresses non-linear relationships that traditional pricing methods fail to capture.

The project follows an end-to-end ML pipeline including data cleaning, feature engineering, model training, error analysis, and deployment via a Streamlit web application for real-time predictions.

Premium Prediction Output 1

Premium Prediction Output 2

Premium Prediction workflow

Key Features

High-accuracy premium prediction (R² ≈ 0.98)
Advanced feature engineering using normalized risk scores
Age-based model segmentation for improved accuracy
Error analysis with residual and percentage deviation tracking
Model retraining using additional genetic risk features
Interactive Streamlit application for real-time use

Project Details

Multiple models including Linear Regression, Ridge Regression, and XGBoost were trained and compared. Initial error analysis revealed high deviation in younger age groups, leading to segmentation-based retraining.

After introducing a genetic risk feature, extreme prediction errors were reduced from 73% to just 2%, resulting in a reliable and explainable production-ready solution.

Technologies Used

Python Pandas Scikit-learn XGBoost Streamlit Matplotlib

View on GitHub Live Demo View Project Post/Presentation