Regression

Repository: challenge-regression
Type of Challenge: Consolidation
Duration: 5 days
Deadline: 06/05/2025 17:00
Solo Challenge

Learning objectives

Be able to preprocess data for machine learning.
Be able to apply a regression in a real context.
Be able to understand some of machine learning.

The Mission

The real estate company "ImmoEliza" asks you to create a machine learning model to predict prices on Belgium's real estate sales.

You have collected your data, you have cleaned and analyzed it a first time! So it's time to do some machine learning with it!

Must-have features

Step 1 : Data cleaning

Preprocess the data to be used with machine learning.

You have to handle NANs.
You have to handle categorical data.
You have to select features.

Step 2: Data split

Now that the dataset is ready, you have to format it for machine learning:

Divide your dataset for training and testing. (X_train, y_train, X_test, y_test)

Step 3: Model selection

The dataset is ready. Now let's select a model.

Look at which models make the most sense according to your data.

Step 4: Apply your model

Apply your model on your data:

Train your model (on the train dataset)
Check for predictions (on single lines or the test dataset)
Once this works, look into sklearn's Pipeline object to make things clean and reusable

Step 5: Model evaluation

Let's evaluate your model. The metric we are interested in is the MAE (Mean Absolute Error). Make sure you understand it well. Try to answer those questions:

How could you improve this result?
Which part of the process has the most impact on the results?
Are there other metrics which would make more sense to evaluate your model.

You may go back a couple of steps if you want to try other types of approaches.

Bonus Step 5.5: Reinventing the wheel

I know some of you will get to a viable model really quickly and will get bored to go back and forth between filtering out outliers and selecting features. The truth is when playing with ML, you only truly understand it when you do it yourself. Here is what you can do:

Watch what most ML models do to make a prediction
Select one which you find elegant
Implement it from scratch using at maximum numpy

Note that some are easier to implement than others.

Step 6: Presentation

Present your results in front of the group.

You have to make a nice presentation with a professional design.
You have 5 minutes to present (without Q&A). You can't use more time, you can't use less time.
You CAN'T show code or jupyter notebook during the presentation.

Constraints

Code style

Each function or class has to be typed
Each function or class has to contain a docstring
Your code should be commented when necessary.
Your code should be cleaned of any unused code.

Deliverables

Pimp up the README file:
- Description
- Installation
- Usage
- (Visuals)
- (Contributors)
- (Timeline)
- (Personal situation)
Present your results in front of the group in 5mins max.

Steps

Create the repository
Study the request (What & Why ?)
Identify technical challenges (How ?)

Evaluation criteria

Criteria	Indicator	Yes/No
1. Is complete	Know how to answer all the above questions.	[ ]
	`pandas` and `matplotlib`/`seaborn` are used.	[ ]
	All the above steps were followed.	[ ]
	A nice README is available.	[ ]
	Your model is able to predict something.	[ ]
2. Is good	You used typing and docstring.	[ ]
	Your code is formatted (PEP8 compliant).	[ ]
	No unused file/code is present.	[ ]

Quotes

“The lottery is a tax on people who don't understand the statistics.” - Anonymous

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
__pycache__		__pycache__
data		data
models		models
pipeline		pipeline
util		util
README.md		README.md
cleaning-for-model.ipynb		cleaning-for-model.ipynb
data_preparator.py		data_preparator.py
manip.ipynb		manip.ipynb
training-models.ipynb		training-models.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Regression

Learning objectives

The Mission

Must-have features

Step 1 : Data cleaning

Step 2: Data split

Step 3: Model selection

Step 4: Apply your model

Step 5: Model evaluation

Bonus Step 5.5: Reinventing the wheel

Step 6: Presentation

Constraints

Code style

Deliverables

Steps

Evaluation criteria

Quotes

About

Uh oh!

Releases

Packages

Languages

becodeorg/immo-eliza-machine-learning-BlueHowl

Folders and files

Latest commit

History

Repository files navigation

Regression

Learning objectives

The Mission

Must-have features

Step 1 : Data cleaning

Step 2: Data split

Step 3: Model selection

Step 4: Apply your model

Step 5: Model evaluation

Bonus Step 5.5: Reinventing the wheel

Step 6: Presentation

Constraints

Code style

Deliverables

Steps

Evaluation criteria

Quotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages