Empirical Risk Minimization Uses Monte Carlo Approximation
In machine learning, we often choose the model $\delta$ which minimizes the expected risk. The risk is the expected loss of the model over the data:
However, we usually don’t know the true data-generating distribution $p_{*}$. So we must use a Monte Carlo to approximate this expectation (i.e. integral).
First, we approximate the distribution of $L(y, \delta \bf(x))$ with the empirical distribution . We simply draw samples, then compute the arithmetic mean of the function applied to the samples:
This is standard empricical risk minimization (ERM), but I want to point out that the emprical risk $R_{emp}$ is a Monte Carlo approximation of an integral.
Recall that in Monte Carlo approximations, we approximate the mean (or other integral) of some statistic using finite samples:
Monte Carlo approximation has the advantage over numerical integration (which is based on evaluating the function at a fixed grid of points) that the function is only evaluated in places where there is non-negligible probability (Murphy p.53). This explains why Monte Carlo is used to approximate integrals and not other numerical integration methods.
References
- Machine Learning from a Probabilistic Perspective (Murphy), p. 204-205
- Natural Language Understanding with Distibuted Representations (Cho). p. 8-9