Logo
IDARE Enterprise AI predictive analytics platform background
Use Case

Predicting Insurance Charges for Health Policyholders

Predicting insurance charges for policyholders to optimize premium pricing and improve cost management strategies.

1Overview & Strategic Importance

Predicting Insurance Charges for Health Policyholders
Regression Solution Healthcare Data

Problem Statement

Predicting insurance charges is essential for health insurance providers to assess risks, optimize premiums, and manage costs effectively. Predicting charges is complex due to the interplay of factors such as age, BMI, smoking status, and family size. Solving this allows companies to design data-driven policies and improve risk assessment.

Required Solutions

  • Developing a regression model to predict charges.
  • Understanding cost drivers like demographics and metrics.
  • Optimizing pricing strategies for insurance providers.

Solution Objectives

  • Uncover patterns and correlations in policyholder data.
  • Develop accurate regression models for charge prediction.
  • Identify key factors influencing insurance costs.
  • Optimize model for real-world risk management.

Understanding the Problem

Charges are influenced by age, BMI, and lifestyle choices.

Smokers and older individuals typically incur higher charges. Machine learning models analyze these relationships to predict future charges and implement targeted interventions.

2About the Data

Data Collection

Dataset contains 1,338 rows of insured data including Age, Sex, BMI, Smoker status, and Region.

Major Parameters Description

Download Training Data
age

Age of the policyholder (in years).

sex

Gender of the policyholder (male or female).

bmi

Body Mass Index, a measure of body fat.

children

Number of dependent children covered.

smoker

Whether the policyholder is a smoker (yes/no).

region

Geographic region of the policyholder.

charges

Insurance charges billed (target variable).

3Using iDareAI

Guided Mode Initialization

AUploading Dataset

Click on the **'Upload CSV or Excel Data'** button → Select a source for the dataset → Upload `health_insurance_train.csv`. The system automatically analyzes the file, extracts column descriptions, and identifies the top value-adding targets for prediction.

Upload UI

BChoosing Analysis Mode

Choose between autonomous machine learning or manual building. In autonomous mode, ask questions like:
  • Which factors have the greatest impact on predicting insurance charges in the dataset?
  • Do individuals with more children tend to have higher or lower insurance charges?

Operation Using Autonomous Guided Mode

AQuery Response

The analysis identified that the greatest factors impacting insurance charges are smoking status, age, and BMI. Specifically, being a smoker significantly raises medical insurance costs. Random Forest provided the most reliable predictions with a low test error margin.

BAutonomous Guided Mode Performance

Auto Analysis

CAI Application

Auto Application

Model Fine-Tuning/Manual Model Building

ASelecting Prediction Target

'charges' column was selected as the target.

Target Selection

BSelecting Analysis Type

The analysis target is a numerical column. Hence, the 'Regression' analysis type is selected.

Analysis Type

CSelecting Model Group/Item

Model Group

DSelecting Features

Select features: age, bmi, smoker, sex, and children.

Feature Selection

ESelecting Training Level

The "Custom" training level allows for algorithm selection and 5-fold cross-validation.

Training Level

AI Modeling Details

Random Forest achieved the best baseline accuracy of 80.9%.

Modeling Details

Training Analysis Details

APredicted Target

Predicted Target

BPredicted Trend

Predicted Trend

CError Trend

Error Trend

DFeature Importance

Feature Importance

Improving Model Accuracy

ACustom Variable Creation

Interaction terms and derived features help the model capture hidden patterns.

Variable NameFormula
age group=IF(A2<25, 'Young', IF(A2<50, 'Middle-aged', 'Senior'))
bmi category=IF(C2<18.5, 'Underweight', IF(C2<25, 'Normal', IF(C2<30, 'Overweight', 'Obese')))
smoker-age interaction=A2*IF(E2='yes', 1, 0)
smoker-bmi interaction=C2*IF(E2='yes', 1, 0)
children category=IF(D2=0, 'No children', IF(D2<=2, 'Few children', 'Many children'))

BAnalysis Details (Improved)

Accuracy increased to 84.3% after feature engineering.

Improved Analysis

4AI APPLICATION

Manual Model Building

Modify sliders for Age, BMI, and Smoker status to see real-time impact on predicted insurance charges. Feature engineering allows for more granular predictions.

Manual App

AI Application Demo

Adjusting the 'Smoker' status and 'BMI' sliders shows how these risk factors directly correlate with significant increases in predicted charges.

Saving the Project

Save your insurance charge analysis by clicking the icon at the bottom left corner of the interface.

Saving

Sharing the Project

Share the application once the analysis is saved to enable on-demand predictions for clients.

Sharing

Interested in similar AI solutions?

Explore our full suite of AI capabilities designed to transform your business operations.