This project implements Linear Regression using scikit-learn to predict house prices from a housing dataset. The notebook demonstrates the complete machine learning workflow, including data loading, preprocessing, model training, prediction, evaluation, and residual analysis.
Linear_Regression
│
├── Linear_Regression.ipynb
├── housing.csv
├── residual_distribution.png
└── README.md
- File: housing.csv
- Type: Tabular housing data
- Purpose: Used to train and evaluate a linear regression model for house price prediction
- Python
- NumPy
- Pandas
- Matplotlib
- scikit-learn
- Load the housing dataset
- Perform train-test split
- Train a Linear Regression model
- Predict house prices on test data
- Evaluate model performance using R² Score
- Analyze residual distribution
R² Score: 0.6395768324695243
Interpretation:
The model explains approximately 64% of the variance in housing prices, which is a reasonable result
for a baseline linear regression model on real-world data.
Residual Distribution (y_test − reg_pred):
Key Insights:
- Residuals are approximately normally distributed
- Indicates that linear regression assumptions are largely satisfied
- Slight skewness suggests potential improvement with advanced models
- Linear Regression provides a strong baseline model
- Model performance can be improved using:
- Feature engineering
- Polynomial regression
- Regularization techniques (Ridge, Lasso)
- Tree-based or ensemble models
- Clone the repository
git clone https://github.com/btboilerplate/Linear_Regression.git
- Install required libraries
pip install numpy pandas matplotlib scikit-learn
- Open Linear_Regression.ipynb and run all cells sequentially
- Add RMSE and MAE evaluation metrics
- Experiment with Polynomial Regression
- Apply feature scaling comparisons
- Try regularized regression models
