Application of Machine Learning Models for Patients Health Insurance Cost Prediction
Annwesha Banerjee Majumder1, Sumit Das2, Aniruddha Biswas3, Trishita Ghosh4, Raj Poddar5, Suchetana Chakraborty6
1Dr. Annwesha Banerjee Majumder, Assistant Professor, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
2Dr. Sumit Das, Associate Professor, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
3Aniruddha Biswas, Assistant Professor, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
4Trisita Ghosh, Assistant Professor, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
5Raj Poddar, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
6Suchetana Chakraborty, Department of Information Technology, JIS College of Engineering, Kalyani (West Bengal), India.
Manuscript Received on 05 August 2025 | Revised Manuscript Received on 06 September 2025 | Manuscript Accepted on 15 September 2025 | Manuscript published on 30 September 2025 | PP: 11-17 | Volume-15 Issue-4, September 2025 | Retrieval Number: 100.1/ijsce.D368515040925 | DOI: 10.35940/ijsce.D3685.15040925
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The use of machine learning models to forecast health insurance costs based on personal characteristics is examined in this study. Age, sex, BMI, number of children, smoking status, and region were among the demographic variables included in the dataset. It was investigated how well several machine learning methods, such as Random Forest, Gradient Boosting, and Linear Regression, estimated insurance costs. After preprocessing the dataset by scaling numerical features and encoding categorical variables, k-fold cross-validation was employed to train and evaluate the regression models. The coefficient of determination (R2), mean absolute error (MAE), and root mean squared error (RMSE) were used to evaluate performance. According to experimental results, Gradient Boosting performed better than Random Forest and Linear Regression.
Keywords: Gradient Boosting. Linear Regression, Mean Squared Error, Random Forest, Root Mean Squared Error.
Scope of the Article: Computer Networks and Its Applications