Skip to content

This repository contains a small collection of Python scripts and data demonstrating how to build and evaluate machine learning classifiers

Notifications You must be signed in to change notification settings

ealbertoav/learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Banknote Authentication ML Examples

Project Description

This repository contains a small collection of Python scripts and data demonstrating how to build and evaluate machine learning classifiers for banknote authentication. Using the classic UCI banknote authentication dataset, the project shows how to:

  • Load a CSV dataset with numeric features
  • Prepare features ("evidence") and labels
  • Split data into training and testing sets
  • Train different classifiers from scikit-learn
  • Evaluate model accuracy on a hold-out test set

The code is intentionally simple and linear, making it suitable for learning or teaching introductory machine learning concepts.


Purpose

The main objectives of this project are to:

  • Illustrate a basic supervised classification workflow in Python
  • Compare different scikit-learn classifiers (Perceptron, SVM, k-NN, Naive Bayes)
  • Provide a compact, runnable example that students can modify and extend

Typical use cases:

  • Classroom demos for an introduction to machine learning
  • Self-study for people new to scikit-learn
  • A starting point for experimenting with banknote fraud detection models

Project Structure

.
├── README.md              # Project overview (this file)
├── lecture.pdf            # Related lecture/teaching material (not used by code)
├── requirements.txt       # Python dependencies
└── banknotes/
    ├── banknotes.csv      # Banknote authentication dataset (UCI-style)
    ├── banknotes0.py      # ML example using manual train/test split
    └── banknotes1.py      # ML example using sklearn.model_selection.train_test_split

Key scripts

  • banknotes/banknotes0.py

    • Demonstrates manual splitting of the dataset into training and testing sets using random.shuffle and a fixed hold-out fraction.
    • Lets you easily switch between different models by changing the model assignment.
  • banknotes/banknotes1.py

    • Demonstrates the same classification task using sklearn.model_selection.train_test_split for splitting the data.
    • Also allows you to toggle between multiple classifiers.

Dependencies

Python packages are listed in requirements.txt:

  • scikit-learn==1.8.0
  • scipy==1.16.3
  • numpy==2.4.0
  • threadpoolctl==3.6.0 (scikit-learn runtime dependency)

You will also need Python 3.9+ (or any version compatible with these package versions).


Installation Instructions

  1. Clone or download the repository

    git clone <your-repo-url>
    cd learning
  2. (Recommended) Create and activate a virtual environment

    python -m venv .venv
    source .venv/bin/activate   # On Windows: .venv\\Scripts\\activate
  3. Install dependencies

    pip install -r requirements.txt

Usage

All current functionality is provided as standalone scripts. From the repository root (or from inside the banknotes/ directory), run one of the example scripts.

1. Run the manual split example (banknotes0.py)

python banknotes/banknotes0.py

What it does:

  • Loads banknotes/banknotes.csv
  • Shuffles all rows
  • Uses 60% for training, 40% for testing
  • Trains the selected classifier
  • Prints the number of correct and incorrect predictions and the overall accuracy

To try a different model, open banknotes/banknotes0.py and edit the model line near the top, for example:

# model = Perceptron()
# model = svm.SVC()
# model = KNeighborsClassifier(n_neighbors=1)
model = GaussianNB()

2. Run the train_test_split example (banknotes1.py)

python banknotes/banknotes1.py

What it does:

  • Loads banknotes/banknotes.csv
  • Separates features (evidence) and labels
  • Uses train_test_split with test_size=0.4 to create training and testing sets
  • Trains the selected classifier
  • Prints the number of correct and incorrect predictions and accuracy

To switch models, edit the model assignment in banknotes/banknotes1.py:

model = Perceptron()
# model = svm.SVC()
# model = KNeighborsClassifier(n_neighbors=1)
# model = GaussianNB()

Data / Input Requirements

The project expects the banknote authentication CSV dataset to be available at:

  • banknotes/banknotes.csv

The CSV format is:

variance,skewness,curtosis,entropy,class
<variance>,<skewness>,<curtosis>,<entropy>,<class>
...

Where:

  • variance, skewness, curtosis, entropy are numeric features extracted from images of banknotes
  • class is the target label, encoded as:
    • 0 → Authentic
    • 1 → Counterfeit

Both scripts map the numeric class column into human-readable labels "Authentic" and "Counterfeit".

If you want to use your own data, it should:

  • Be a CSV file with four numeric feature columns followed by a binary class column
  • Use the same order of columns as above, or you’ll need to adjust the indexing logic in the Python scripts.

Key Features

  • Simple, self-contained examples of:
    • Loading CSV data in Python
    • Preparing feature and label arrays for scikit-learn
    • Manual vs. train_test_split-based train/test separation
    • Trying multiple classifiers with minimal code changes
    • Computing and printing basic accuracy metrics
  • Includes a real-world-like dataset commonly used in ML education.

Technologies Used

  • Language: Python
  • Libraries:
    • scikit-learn for machine learning models and train/test splitting
    • numpy and scipy as numerical computation backends for scikit-learn
    • Python standard library modules: csv, random

License

No explicit license file (e.g. LICENSE) was found in this repository. If you intend to reuse or distribute this code, you should:

  • Add an appropriate open-source license file to the project root (e.g., MIT, Apache 2.0, BSD-3-Clause), and
  • Update this section of the README to reflect the chosen license.

About

This repository contains a small collection of Python scripts and data demonstrating how to build and evaluate machine learning classifiers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages