This repository contains a small collection of Python scripts and data demonstrating how to build and evaluate machine learning classifiers for banknote authentication. Using the classic UCI banknote authentication dataset, the project shows how to:
- Load a CSV dataset with numeric features
- Prepare features ("evidence") and labels
- Split data into training and testing sets
- Train different classifiers from scikit-learn
- Evaluate model accuracy on a hold-out test set
The code is intentionally simple and linear, making it suitable for learning or teaching introductory machine learning concepts.
The main objectives of this project are to:
- Illustrate a basic supervised classification workflow in Python
- Compare different scikit-learn classifiers (Perceptron, SVM, k-NN, Naive Bayes)
- Provide a compact, runnable example that students can modify and extend
Typical use cases:
- Classroom demos for an introduction to machine learning
- Self-study for people new to scikit-learn
- A starting point for experimenting with banknote fraud detection models
.
├── README.md # Project overview (this file)
├── lecture.pdf # Related lecture/teaching material (not used by code)
├── requirements.txt # Python dependencies
└── banknotes/
├── banknotes.csv # Banknote authentication dataset (UCI-style)
├── banknotes0.py # ML example using manual train/test split
└── banknotes1.py # ML example using sklearn.model_selection.train_test_split
Key scripts
-
banknotes/banknotes0.py- Demonstrates manual splitting of the dataset into training and testing sets using
random.shuffleand a fixed hold-out fraction. - Lets you easily switch between different models by changing the
modelassignment.
- Demonstrates manual splitting of the dataset into training and testing sets using
-
banknotes/banknotes1.py- Demonstrates the same classification task using
sklearn.model_selection.train_test_splitfor splitting the data. - Also allows you to toggle between multiple classifiers.
- Demonstrates the same classification task using
Python packages are listed in requirements.txt:
scikit-learn==1.8.0scipy==1.16.3numpy==2.4.0threadpoolctl==3.6.0(scikit-learn runtime dependency)
You will also need Python 3.9+ (or any version compatible with these package versions).
-
Clone or download the repository
git clone <your-repo-url> cd learning
-
(Recommended) Create and activate a virtual environment
python -m venv .venv source .venv/bin/activate # On Windows: .venv\\Scripts\\activate
-
Install dependencies
pip install -r requirements.txt
All current functionality is provided as standalone scripts. From the repository root (or from inside the banknotes/ directory), run one of the example scripts.
python banknotes/banknotes0.pyWhat it does:
- Loads
banknotes/banknotes.csv - Shuffles all rows
- Uses 60% for training, 40% for testing
- Trains the selected classifier
- Prints the number of correct and incorrect predictions and the overall accuracy
To try a different model, open banknotes/banknotes0.py and edit the model line near the top, for example:
# model = Perceptron()
# model = svm.SVC()
# model = KNeighborsClassifier(n_neighbors=1)
model = GaussianNB()python banknotes/banknotes1.pyWhat it does:
- Loads
banknotes/banknotes.csv - Separates features (
evidence) and labels - Uses
train_test_splitwithtest_size=0.4to create training and testing sets - Trains the selected classifier
- Prints the number of correct and incorrect predictions and accuracy
To switch models, edit the model assignment in banknotes/banknotes1.py:
model = Perceptron()
# model = svm.SVC()
# model = KNeighborsClassifier(n_neighbors=1)
# model = GaussianNB()The project expects the banknote authentication CSV dataset to be available at:
banknotes/banknotes.csv
The CSV format is:
variance,skewness,curtosis,entropy,class
<variance>,<skewness>,<curtosis>,<entropy>,<class>
...
Where:
variance,skewness,curtosis,entropyare numeric features extracted from images of banknotesclassis the target label, encoded as:0→ Authentic1→ Counterfeit
Both scripts map the numeric class column into human-readable labels "Authentic" and "Counterfeit".
If you want to use your own data, it should:
- Be a CSV file with four numeric feature columns followed by a binary class column
- Use the same order of columns as above, or you’ll need to adjust the indexing logic in the Python scripts.
- Simple, self-contained examples of:
- Loading CSV data in Python
- Preparing feature and label arrays for scikit-learn
- Manual vs.
train_test_split-based train/test separation - Trying multiple classifiers with minimal code changes
- Computing and printing basic accuracy metrics
- Includes a real-world-like dataset commonly used in ML education.
- Language: Python
- Libraries:
scikit-learnfor machine learning models and train/test splittingnumpyandscipyas numerical computation backends for scikit-learn- Python standard library modules:
csv,random
No explicit license file (e.g. LICENSE) was found in this repository. If you intend to reuse or distribute this code, you should:
- Add an appropriate open-source license file to the project root (e.g., MIT, Apache 2.0, BSD-3-Clause), and
- Update this section of the README to reflect the chosen license.