Image Recognition project, Universidad Politecnica de Madrid.
Goals:
- Develop proficiency in using Tensorflow/Keras for training Neural Nets (NNs).
- Put into practice acquired knowledge to optimize the parameters and architecture of a feed-forward Neural Net (ffNN), in the context of an image recognition problem.
- Put into practice NNs specially conceived for analysing images. Design and optimize the parameters of a Convolutional Neural Net (CNN) to deal with previous image classification task.
- Train popular architectures from scratch (e.g., GoogLeNet, VGG, ResNet, ...), and compare the results with the ones provided by their pre-trained versions using transfer learning.
xView (http://xviewdataset.org/) is a large publicly available object detection data set, with approximately 1 million objects across 60 categories. It contains manually annotated images from different scenes around the world, acquired using the WorldView-3 satellite at 0.3m ground sample distance. There are 846 annotated images in total. For this practice, we divide these annotations into 761 and 85 images for training and testing respectively
Image recognition is crucial in computer vision, as it involves recognizing and categorizing objects, scenes, or patterns in digital images or video frames. Its significance resonates across diverse real-world applications, spanning autonomous driving, medical diagnosis, surveillance, and augmented reality. The heart of image recognition lies in creating algorithms that can understand and interpret the complex visual information contained within images. Traditional methodologies heavily leaned on manually crafted features and shallow machine learning models, often grappling with the complexities inherent in visual patterns and the variability introduced by lighting conditions, viewpoints, and background clutter. However, with the emergence of deep learning, particularly convolutional neural networks (CNNs), substantial strides have been made in the domain of image recognition. By traversing multiple layers of convolution, pooling, and non-linear transformations, CNNs adeptly extract increasingly abstract features, thereby facilitating precise recognition of objects and patterns depicted in images. However, despite these remarkable advancements, image recognition still presents significant challenges due to factors such as occlusions, variations in scale and orientation, and the presence of multiple objects within a single scene