Skip to content

Implemented Linear Regression using scikit-learn to predict housing prices. Evaluated model performance using R² score (~0.64) and residual distribution analysis to assess prediction errors and regression assumptions.

Notifications You must be signed in to change notification settings

btboilerplate/Linear_Regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📈 Linear Regression on Housing Dataset

Python Machine Learning Status

🔹 Project Overview

This project implements Linear Regression using scikit-learn to predict house prices from a housing dataset. The notebook demonstrates the complete machine learning workflow, including data loading, preprocessing, model training, prediction, evaluation, and residual analysis.


📂 Repository Contents

Linear_Regression

├── Linear_Regression.ipynb
├── housing.csv
├── residual_distribution.png
└── README.md


📊 Dataset

  • File: housing.csv
  • Type: Tabular housing data
  • Purpose: Used to train and evaluate a linear regression model for house price prediction

🛠️ Libraries & Tools Used

  • Python
  • NumPy
  • Pandas
  • Matplotlib
  • scikit-learn

⚙️ Project Workflow

  1. Load the housing dataset
  2. Perform train-test split
  3. Train a Linear Regression model
  4. Predict house prices on test data
  5. Evaluate model performance using R² Score
  6. Analyze residual distribution

📈 Model Evaluation

R² Score: 0.6395768324695243

Interpretation:
The model explains approximately 64% of the variance in housing prices, which is a reasonable result for a baseline linear regression model on real-world data.


📉 Residual Analysis

Residual Distribution (y_test − reg_pred):

Residual Distribution

Key Insights:

  • Residuals are approximately normally distributed
  • Indicates that linear regression assumptions are largely satisfied
  • Slight skewness suggests potential improvement with advanced models

📌 Key Observations

  • Linear Regression provides a strong baseline model
  • Model performance can be improved using:
    • Feature engineering
    • Polynomial regression
    • Regularization techniques (Ridge, Lasso)
    • Tree-based or ensemble models

▶️ How to Run the Project

  1. Clone the repository
git clone https://github.com/btboilerplate/Linear_Regression.git
  1. Install required libraries
pip install numpy pandas matplotlib scikit-learn
  1. Open Linear_Regression.ipynb and run all cells sequentially

🚀 Future Enhancements

  • Add RMSE and MAE evaluation metrics
  • Experiment with Polynomial Regression
  • Apply feature scaling comparisons
  • Try regularized regression models

About

Implemented Linear Regression using scikit-learn to predict housing prices. Evaluated model performance using R² score (~0.64) and residual distribution analysis to assess prediction errors and regression assumptions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published