An Introduction to Content-based Image Retrieval

Access to appropriate information is a fundamental necessity in the modern society, and information retrieval techniques have wide applications in various areas. For example, commercial search services such as Google have become indispensable tools in the people’s work and daily life. The exponential growth of digital images has motivated research into image retrieval.

The conventional methods of image retrieval involved adding metadata such as captioning, keywords or descriptions to the images so that retrieval is done over the annotation words. However, metadata image retrieval becomes inadequate since it suffers from several problems like the lack of appropriate metadata associated with the image as well as the limitation to keyword expression of visual content.

A solution to solve the problem of image retrieval involves analysing the content of the image rather than the metadata. This approach is known as content-based image retrieval (CBIR). The term “content” in this context refers to the colour shapes, textures, or any other information that can be gotten from the image. CBIR is preferable because it does not rely on the completeness and quality of annotation.

Content-based image retrieval (CBIR) aims to query images by using visual properties of the image as search queries rather than metadata associated with the images like captions, tags and annotations.

Content-based image retrieval (CBIR) still attracts a lot of attention from the multimedia community, thanks in part to the scalability challenge and also the emergence of insights into new machine learning models.

Over the decades, the progress of CBIR has been extensively discussed in existing research papers [1]. The various techniques developed for image representation in CBIR include global feature representations, for example colour features [2], edge features [2], texture features [3], GIST [4], and CENTRIST [5], and local feature representations such as the bag-of-words (BoW) models [6] using invariant visual features (e.g. SIFT [7], and SURF [8], etc.).

Image representation and image similarity measurement form the crux of the problem in CBIR. In image representation, the goal is to transform an image into some kind of feature space while still maintaining the intrinsic value of the visual content. The representation is meant to distinguish similar and dissimilar images.

Typical CBIR approaches used rigid similarity/distance functions that extracted low-level features for image search, for example Euclidean distance or cosine similarity. In an ideal world, the similarity between the images should incorporate the high-level concepts perceived by humans. But, it is difficult because of the semantic gap issue.

Machine learning offers promise in addressing the semantic gap issue due to its recent successes in performing high-level perception tasks. A range of similarity/distance functions that explore machine learning techniques have been proposed [9] [10]. For example, Norouzi [9] adopted a mapping learning scheme for large-scale multimedia applications that preserves semantic similarity by transforming high-dimensional data to binary codes. Jegou [11] implemented the fisher kernel to aggregate local descriptors and utilised a joint dimension reduction that condensed an image to a couple of bytes while maintaining high accuracy.

Another technique used to enhance feature representation is distance metric learning (DML). They generally work by learning an optimal metric that minimises the distance between similar images and maximising the distance between dissimilar images. In order to handle large-scale data, online DML algorithms were developed [12] [13]. For example, Chechik et al. proposed an online algorithm for scalable image similarity (OASIS) [10] for improving image retrieval performance.

Conclusion

References

Machine Learning Researcher- Deep Learning, Generative Models, Reinforcement Learning, Bayesian Methods, NLP, Computer Vision

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store