This machine learning project predicts student performance based on various academic, demographic, and behavioral factors. The goal is to classify whether a student is likely to pass or fail using supervised classification techniques.
- Source: UCI Machine Learning Repository β Student Performance
- Files used:
student-mat.csv(Mathematics performance data)
The dataset includes:
- Demographic info (e.g.,
sex,age,address) - Family and social info (e.g.,
famsize,Pstatus,schoolsup) - Academic info (e.g.,
studytime,failures,absences, gradesG1,G2,G3)
Target variable:
passβ Created as 1 ifG3 >= 10, otherwise 0
- Data Loading and Preprocessing
- Feature Engineering
- Encoding categorical variables
- Train/Test Split
- Model training (Random Forest Classifier)
- Model evaluation (Accuracy, Classification Report)
- Visualizations:
- Grade distribution
- Correlation heatmap
- Study time vs Pass
- Feature importance
- RandomForestClassifier (from
sklearn.ensemble) - Accuracy achieved: XX% (fill with your model score)
- Histogram of grades
- Heatmap of feature correlations
- Boxplots of study time vs pass/fail
- Barplot of feature importances
Install required packages:
pip install -r requirements.txt