Stroke Risk Prediction API

This repository contains the deployment code for the Stroke Risk Prediction model.

Project Structure

stroke-risk-prediction-api/
│
├── README.md
├── LICENSE
├── requirements.txt
├── Dockerfile
├── entrypoints.sh
├── .gitignore
├── main.py
├── best_model.joblib
└── test/
    └── test_api.py

This structure outlines the main components of the project.

Introduction

The Stroke Risk Prediction API is a machine learning-based service that predicts the risk of stroke based on various health and lifestyle factors.

By improving the identification of high-risk individuals, especially those in the minority class, this model could have the potential to enhance early intervention strategies for stroke prevention. While the current models show limitations in precision, their improved recall for the minority class could lead to more comprehensive screening processes, potentially reducing the overall incidence and impact of strokes. The project also highlights the challenges in predicting rare events like strokes and provides a foundation for further research in handling imbalanced healthcare datasets.

Dataset

The project utilizes the Stroke Prediction Dataset from Kaggle.

Installation

These instructions will get you a copy of the project up and running on your local machine.

Clone the repository:

git clone https://github.com/AgInsideOut/stroke-risk-prediction-api.git

Navigate to the project directory:
```
cd stroke-risk-prediction-api
```
Install the required packages:
```
pip install -r requirements.txt
```

Usage

The API is live and can be accessed at: https://stroke-risk-prediction-api.onrender.com

To run the API locally:

Ensure you're in the project directory.
Run the following command:
```
uvicorn main:app --reload
```
The API will be available at http://localhost:8000.

API Endpoints

Base URL: https://stroke-risk-prediction-api.onrender.com

Root Endpoint
- URL: https://stroke-risk-prediction-api.onrender.com/
- Method: GET
- Description: Welcome message for the API.
- Response: {"message": "Welcome to the Stroke Prediction API"}

Prediction Endpoint

URL: https://stroke-risk-prediction-api.onrender.com/predict
Method: POST
Description: Predicts stroke risk based on input features.

Request Body: JSON object with the following fields:

{
  "age": float,
  "hypertension": int,
  "heart_disease": int,
  "ever_married": int,
  "avg_glucose_level": float,
  "bmi": float,
  "gender_Male": int,
  "gender_Female": int,
  "gender_Other": int,
  "work_type_Govt_job": int,
  "work_type_Never_worked": int,
  "work_type_Private": int,
  "work_type_Self_employed": int,
  "work_type_children": int,
  "Residence_type_Rural": int,
  "Residence_type_Urban": int,
  "smoking_status_formerly_smoked": int,
  "smoking_status_smokes": int,
  "smoking_status_Unknown": int,
  "smoking_status_never_smoked": int
}

Response: {"prediction": int} (0 for low risk, 1 for high risk)

Testing the API

You can test the live API using curl or any API testing tool. Here's an example using curl:

curl -X POST "https://stroke-risk-prediction-api.onrender.com/predict" \
     -H "Content-Type: application/json" \
     -d '{
       "age": 65.0,
       "hypertension": 1,
       "heart_disease": 0,
       "ever_married": 1,
       "avg_glucose_level": 100.0,
       "bmi": 28.5,
       "gender_Male": 1,
       "gender_Female": 0,
       "gender_Other": 0,
       "work_type_Govt_job": 0,
       "work_type_Never_worked": 0,
       "work_type_Private": 1,
       "work_type_Self_employed": 0,
       "work_type_children": 0,
       "Residence_type_Rural": 0,
       "Residence_type_Urban": 1,
       "smoking_status_formerly_smoked": 1,
       "smoking_status_smokes": 0,
       "smoking_status_Unknown": 0,
       "smoking_status_never_smoked": 0
     }

Model Information

The model used in this API is an Easy Ensemble Classifier trained to predict stroke risk. This model was chosen for its ability to handle imbalanced datasets and its high recall for the minority class. Here are some key performance metrics:

Precision (Class 1): 0.1429
Recall (Class 1): 0.2000
F1 Score (Class 1): 0.1667
Class 1 F-2 Score: 0.1852

Note: These metrics are for the positive class (high risk of stroke). The model's performance should be interpreted in the context of early risk identification, where higher recall is preferred over precision. This model excels at identifying potential stroke cases, which is crucial for early intervention and prevention strategies.

Key points:

High Recall: The model correctly identifies 86.41% of actual stroke cases, which is essential for a health screening tool.
Balanced Accuracy: The model achieves a balanced accuracy of 76.13%, indicating good overall performance considering both classes.
Trade-off: While precision is low, this is acceptable in a medical context where missing a potential stroke case (false negative) is generally considered more costly than a false positive.

This model is particularly suitable for initial screening, where capturing as many potential stroke cases as possible is the primary goal. Follow-up assessments can then be conducted to confirm the risk for individuals flagged by the model.

Authors

Agnieszka Thiel - Initial work - AgInsideOut

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

Dataset License

The Stroke Prediction Dataset is licensed under CC BY-SA 4.0.

Acknowledgments

Data Source: Fedesoriano's Stroke Prediction Dataset
Project Requirements: Turing College

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stroke Risk Prediction API

Table of Contents

Project Structure

Introduction

Dataset

Installation

Usage

API Endpoints

Testing the API

Model Information

Authors

License

Dataset License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
post-req-recording		post-req-recording
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
best_model.joblib		best_model.joblib
entrypoint.sh		entrypoint.sh
main.py		main.py
requirements.txt		requirements.txt

License

AgInsideOut/stroke-risk-prediction-api

Folders and files

Latest commit

History

Repository files navigation

Stroke Risk Prediction API

Table of Contents

Project Structure

Introduction

Dataset

Installation

Usage

API Endpoints

Testing the API

Model Information

Authors

License

Dataset License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages