Machine learning algorithms can be classified into two broad categories. Discriminative models learn the conditional probability distribution of labels given the data p(y|x). These models are easy to train, however, they lack the ability to understand the underlying distribution of the data x. Generative models, in contrast, learn the joint probability density p(x,y). The ability to jointly model data $x$ and labels $y$, allows them to generalize better in case of limited and/ or missing data. They can also be used to compute conditional density using the Bayes rule . \cite{ng2002discriminative} provides a nice analysis of discriminative vs generative models and the situations in which we prefer one over the other. Additionally, they can also be used to generate new samples $(x,y)$ from the distribution. For instance, generative models are used to generate new hypothetical faces of celebrity, generate more text in a particular handwriting style and new artificial environments to be used in reinforcement learning. Other applications include super-resolution \cite{ledig2016photo}, conversion of sketches to images \cite{chen2018sketchygan}, etc. (Flanagan & Matsumoto, 2008)
Typically, we train generative models by maximizing the likelihood of the training data . However, optimizing the maximum likelihood objective often becomes intractable in the general formulation of . A wide range of models like Gaussian mixture models, Markov chains, and hidden Markov models makes simplifying conditional independence assumptions to make the objective tractable. However, due to the simplifying assumptions, these models are limited in terms of expressiveness. Hence they are not able to model many data distributions of interest which are often non-linear and highly complex. Deep generative models relax some of these strong assumptions and model the joint density using neural networks as a powerful function approximator. Deep generative models can be further classified as explicit density models and implicit density models based on how we define likelihood, its gradients, and any approximations.
Explicit density models define an explicit function to represent the probability density of the data. Explicit models that use approximations for the intractable likelihood allow us to use a large class of powerful models. Instead of optimizing the intractable likelihood directly, these models optimize a lower bound approximation of the objective. VAEs fall in this category. The main disadvantage of these methods is that when we have a weak approximation of the posterior or the prior distribution, optimizing the lower bound does not effectively learn the target distribution. Implicit density models do not model the probability density explicitly. Instead, it just provides a stochastic procedure to generate data. GANs fall under this category. These models are trained by directly sampling from the using a game theoretic approach between two networks, the generator and the discriminator. However, finding an equilibrium is hard and it makes them hard to train.
Full report - link