Skip to content

A comparative study of regression models (Decision Tree, Random Forest, Ridge, Lasso, SVM) for predicting real estate prices in King County, NYC, and California using PCA & Pipeline techniques.

Notifications You must be signed in to change notification settings

Akimuddinshaikh/Machine-Learning-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Machine Learning Models for House Price Prediction A Comparative Study of Regression Models

Author: Akimuddin Aslam Shaikh Institution: National College of Ireland, School of Computing

Project Overview This project explores various machine learning regression models to predict house prices based on real estate data from King County, NYC, and California. The study aims to compare the performance of different regression models and determine the most suitable approach for real estate price forecasting.

Key Highlights ✅ Implemented Decision Tree, Lasso, Linear, Random Forest, Ridge, and Support Vector Machine (SVM) Regression ✅ Applied Pipeline module & PCA (Principal Component Analysis) for dimensionality reduction ✅ Evaluated models using R-squared score, Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) ✅ Used StatsModel for regression analysis ✅ Found that Random Forest Regression with PCA performed best for King County & California datasets, while NYC dataset yielded poor results across models

Technologies & Libraries Used Python Scikit-Learn (Regression models & PCA) StatsModels (Regression analysis) Matplotlib & Seaborn (Data visualization) Pandas & NumPy (Data preprocessing)

Regression Models & Performance Model R² Score MSE RMSE Dataset Random Forest + PCA Best Low Low King County, California Decision Tree Regression Moderate Medium Medium King County, California Linear Regression Moderate High High NYC SVM Regression Poor High High NYC Key Finding: The Random Forest Regression model with PCA performed best for King County & California datasets. NYC Dataset: None of the models achieved a positive R-squared score, highlighting dataset limitations.

About

A comparative study of regression models (Decision Tree, Random Forest, Ridge, Lasso, SVM) for predicting real estate prices in King County, NYC, and California using PCA & Pipeline techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published