We cannot evaluate the true posterior p(ω|X,Y) analytically since it becomes intractable. Instead, what we do is to specify a structure that is easy to evaluate i.e., an approximating variational distribution , parametrised by . In other words, we use as a proxy for p(ω|X,Y) to make predictions or to investigate the posterior distribution of the hidden variables. Ideally, should be very close to p(ω|X,Y).
Therefore, we measure the closeness between the two distributions (minimising) with Kullback–Leibler (KL) divergence  with regard to θ:
In modern practice, neural networks and non-parametric methods such as Gaussian processes with millions of parameters are optimised to fit datasets. It’s an open question on the generalization of such large models but it is evident that such models are very expensive to train. BDL could offer a solution to the scaling challenges of neural networks with evidence showing robustness to overfitting, uncertainty estimates, and they could easily learn from limited dataset. We can view classical training as performing approximate Bayesian inference, using the approximate posterior.
Bayesian deep learning in a more general term encompasses the intersection between probabilistic Bayesian…
Access to appropriate information is a fundamental necessity in the modern society, and information retrieval techniques have wide applications in various areas. For example, commercial search services such as Google have become indispensable tools in the people’s work and daily life. The exponential growth of digital images has motivated research into image retrieval.
The conventional methods of image retrieval involved adding metadata such as captioning, keywords or descriptions to the images so that retrieval is done over the annotation words. …
Machine Learning Researcher- Deep Learning, Generative Models, Reinforcement Learning, Bayesian Methods, NLP, Computer Vision