This project is a CNN pipeline for classifying lichens in images.
config - configuration file that contains all the hyperparameters and paths to the data directories
data_prep - main script for downloading and cleaning the iNaturalist observation data
split - main script for splitting the image into test, training and validation sets
train - main script for training the model using the training and validation sets
scraping - fetching images from iNaturalist and creating training, validation and test sets
obs_data - data wrangling functions for observation data, loading and label functions for images
plotting - plotting functions for observation and image data
cnn_model - convolutional neural network model architecture
evaluate - evaluating the trained model on a test set
The raw data can be found in the data folder. It was downloaded using iNaturalist's export tool in two batches (before 2017-01-01 and between 2017-01-01 and 2025-04-19.)
Filters:
- Research grade
- Most identifiers agreed
- Open geoprivacy (location not obscured)
- United States
Since lichen are not restricted to one taxonomic group, 'lichen' were defined as observations matching the 1,017 genera on the Lichen genera wikipedia category page. Taxon IDs were queried using iNaturalist's API and then passed into the export tool.
The exact query parameters are listed here.