AutoML Computer Vision API

This program is a lightweight, scalable image classification API powered by PyTorch and FastAPI. It has a fully automated training, hyperparameter tuning and deployment of a computer vision model. The API is production-ready, deployable via Docker and Google Cloud Run.

Introduction

This project delivers a lightweight, production-ready AutoML application for image classification tasks. It enables users to train, tune and deploy convolutional neural networks (CNNs) with minimal manual intervention. It is built on PyTorch (model training and evaluation), Optuna (hyperparameter tuning), and FastAPI (microservice). The program automates model selection and presents a scalable inference API on Google Cloud Run. Although originally created as a Capstone project, the application is a template for building reproducible and cost effective ML systems.

Features

AutoML training pipeline with Optuna-based hyperparameter tuning.
PyTorch's Resnet18 CNN classifier.
MLflow integration for experiment tracking and model versioning.
Containerized using Docker.
FastAPI based microservice.
Deployment on Google Cloud Run for autoscaling and cost control.
Inference-ready REST API.

Project File Structure

.github/
    └── workflows/
        └── deploy.yml
images/
    └── architecture.png
model/
    ├── data/
        ├── model.pth
        └── pickle_module_info.txt
    ├── conda.yaml
    ├── MLmodel
    ├── python_env.yaml
    └── requirements.txt
outputs/
    ├── final_model_confusion_matrix.png
    └── final_model_misclassified_images.png
src/
    ├── config.py
    ├── data_loader.py
    ├── eval_utils.py
    ├── train_final.py
    ├── train.py
    └── tune.py
static/
    └── index.html
tests/
    ├── test_data_loader.py
    └── test_eval_utils.py
.coveragerc
.dockerignore
.gitignore
app.py
Dockerfile
main.py
README.md
requirements-dev.txt
requirements.txt

Architecture

Methods

The project is highly modular and has the following structure:

Data Preprocessing: Images are loaded, resized, normalized and augmented for training using torchvision transforms.
Baseline Training: A pretrained ResNet18 model is fine-tuned on the dataset using supervised learning.
Hyperparameter Tuning: Optuna tests different learning rates, optimizers, and epochs to find the best configuration.
Final Model Training: The best hyperparameters are used to retrain the model on the full dataset.
Evaluation & Logging: MLflow tracks the experiments (different models trained), saves metrics like learning rate, optimizer, epochs, confusion matrix plots and model artifacts.
Model Serving: The trained model is containerized using Docker and deployed on Google Cloud Run using FastAPI for inference.

Quickstart

Clone the repo

git clone https://github.com/hamodikk/automl-cv-api.git
cd automl-cv-api

Build and run locally

docker build -t automl-cv-api .
docker run -p 8080:8080 automl-cv-api

Access the API Visit http://localhost:8080/docs to view the Swagger UI.

Running Locally

Make sure Python 3.12 is installed.

pip install -r requirements.txt
uvicorn app:app --reload

Deploying to Google Cloud Run

Ensure that you've:

Installed and authenticated the Google Cloud CLI (gloud init).
Enable Cloud Run and Artifact Registry services.
Created a GCP project.

Then run:

docker build -t gcr.io/YOUR_PROJECT_ID/automl-cv-api .
docker push gcr.io/YOUR_PROJECT_ID/automl-cv-api
gcloud run deploy automl-cv-api \
    --image gcr.io/YOUR_PROJECT_ID/automl-cv-api \
    --platform managed \
    --region us-central1 \
    --allow-unauthenticated \
    --port 8080 \
    --memory=1Gi

Prediction Example

POST /predict Upload an image file as form data to receive a JSON response with the predicted label and confidence score.

curl -X POST http://localhost:8080/predict \
    -F file=@example.jpg

Response

{
    "label": "mountain",
    "confidence": 0.9627,
    "class_index": 2
}

Results

The tuned ResNet18 model achieved the following:

Validation Accuracy: 93.4%
Final Test Accuracy: Consistently above 92% across held-out examples.

Confusion Matrix has been generated for the final model, as well as a sample of misclassified images have been selected to visualize the potential issues.

Discussion & Limitations

Discussion

The baseline model with default hyperparameters (5 epochs, 0.001 learning rate) had a validation accuracy of ~0.25%. The Optuna hyperparameter tuning alone was able to improve the validation accuracy to ~93%. Upon examining the confusion matrix and the misclassified samples, we can see that some of the inaccuracy is likely caused due to the multi-label nature of the images. For example, an image labeled with "street" could include buildings. However, this could cause the model to correctly learn "building" and correctly classify the "building" in the image but be incorrect since the image only includes "street" label. The dataset's singular label structure likely reduces the accuracy of the model, which in this case would be not representative of the actual accuracy of the model. Including multi-label for this dataset could potentially improve model accuracy further.

Limitations

Some of the limitations of the current application are:

Fixed Model Architecture: ResNet18 is used for the entire pipeline. Automated model selection would align the application closer to true AutoML from start to finish, as well as make it more accessible.
No UI for Training: Dataset upload, model training, and tuning does not include a user interface, reducing accesibility by non-data science backgrounds.
CPU-Based Deployment: Google Cloud Run deploys to CPU, which could be a limiting factor when it comes to scalability if large batch inference is implemented in the future.

Future Enhancements

End-to-end AutoML pipeline: Allow users to upload labeled datasets to train, tune, and deploy autonomously.
Batch prediction UI: Web form for uploading multiple images at once and downloading predictions as CSV.
Simple HTML frontend: Public-facing landing page with embedded prediction form. ✅
CI/CD with GitHub Actions: Automate Docker build and deploy to Cloud Run on main pushes. ✅
GCS integration: Accept image/data uploads directly to Google Cloud Storage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoML Computer Vision API

Table of Contents

Introduction

Features

Project File Structure

Architecture

Methods

Quickstart

Running Locally

Deploying to Google Cloud Run

Prediction Example

Results

Discussion & Limitations

Discussion

Limitations

Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
images		images
model		model
outputs		outputs
src		src
static		static
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
main.py		main.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

hamodikk/automl-cv-api

Folders and files

Latest commit

History

Repository files navigation

AutoML Computer Vision API

Table of Contents

Introduction

Features

Project File Structure

Architecture

Methods

Quickstart

Running Locally

Deploying to Google Cloud Run

Prediction Example

Results

Discussion & Limitations

Discussion

Limitations

Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages