Skip to content

Our objective is to predict hourly bike rentals to ensure supply can meet demand.

Notifications You must be signed in to change notification settings

malzeerah/Bike-Rentals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python | Jupyter Notebook | Regression

Project Overview

Our objective is to predict hourly bike rentals to ensure supply can meet future demand.
In addition, we will investigate the following business problems:

  • Is there a day of the week when bikes are rented more than others?
  • Is there an hour when bikes are rented most?
  • Is there a season when bikes are rented more than others?
  • Is there a temperature when most bikes are rented?

Code & Resources

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, pandas_profiling, math
Supervised learning approach: Regression
IDE Jupyter Notebook

Data: The 'Seoul Bike Sharing' dataset was provided by UCI Archives.

The dataset was comprised of 1 year worth of bike rentals containing the following attributes:

  • Date (Dec 2017 – Nov 2018)
  • Hour of Day
  • Number of bikes rented hourly (0 – 3,556)
  • Hourly Weather Conditions (temp, humidity, rain, snow, wind, etc.)
  • Seasons (Spring, Summer, Fall, Winter)
  • Holiday (Holiday/Non-Holiday)
  • Functional Day (Closed/Open)

Data Science Project Framework

  • Frame the business problem
  • Obtain the data
  • Preprocessing
  • Exploratory Data Analytics (EDA)
  • Perform modeling
  • Communicate Results

Model Building & Performance

I started by transforming the categorical variables into numeric variables.
Then I created train and tests sets with a test size of 25%.

For the first model, I included all data points as features. We tried three algorithms (Random Forest, Linear Regression, Support Vector Machine) and determined Random Forest performed the best.

Random Forest Outcomes:
  R Squared: 0.929
  RMSE: 168.621

From our first model, we discovered temperature and hour of the day were the most important features. With this insight I discretized the temperature data and built a new model with this new data point. I tried the same three algorithms (Random Forest, Linear Regression, Support Vector Machine) and determined Random Forest performed the best.

Random Forest Outcomes:
  R Squared: 0.925
  RMSE: 173.585

About

Our objective is to predict hourly bike rentals to ensure supply can meet demand.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published