26: Likelihood

Author

Derek Sollberger

Published

April 7, 2023

Notation

Notation

Recall,

  • Lower-case \(\{x_{1}, x_{2}, x_{3}, ..., x_{n}\}\) is a set of observations
  • Upper-case \(\{X_{1}, X_{2}, X_{3}, ..., X_{n}\}\) is a set of random variables (i.e. a data set)
  • Treating \(\{X_{1}, X_{2}, ..., X_{n}\}\) as a set of \(n\) i.i.d. (independent and identically distributed) random variables is a common assumption.
  • With independence, \[P(X_{1}, X_{2}, ..., X_{n}) = P(X_{1}) \cdot P(X_{2}) \cdot ... \cdot P(X_{n})\]
  • Each individual probability is computed (at least theoretically) with a PDF (probability density function) \[P(x_{i}) = f_{X}(x_{i})\]

Inverse

Inverse

Suppose that we have a sample of data \(\{x_{1}, x_{2}, x_{3}, ..., x_{n}\}\). Now we want to model with a probability distribution, but we need to figure out the distribution’s parameters. Let us think about this in a Bayesian way:

\[{\color{purple}{P(\text{model} | \text{data})}} = \displaystyle\frac{ {\color{blue}{P(\text{data} | \text{model})} \cdot P(\text{model})} }{ {\color{red}{P(\text{data})}} }\]

  • \({\color{purple}{P(\text{model} | \text{data})}}\) is the posterior probability that we want
  • \({\color{blue}{P(\text{data} | \text{model})}}\) is a likelihood
  • Since the prior probability \({\color{red}{P(\text{data})}}\) is a constant …

… we say that the posterior probability is proportional to the likelihood.

Likelihood

Likelihood Function

Let the likelihood function, in terms of a parameter \(\theta\), be the joint probability

\[L(\theta) = P(X_{1} = x_{1}, X_{2} = x_{2}, ..., X_{n} = x_{n}) = f_{X}(x_{1}) \cdot f_{X}(x_{2}) \cdots f_{X}(x_{n})\]

or

\[L\left(\theta; \left\{x_{i}\right\}_{i=1}^{n}\right) = \displaystyle\prod_{i = 1}^{n} f_{X}(x_{i})\]

Suppose that we have data for how long a certain type and brand of light bulb operated (in the same working conditions), and that data in months was

\[6, \quad 18, \quad 29, \quad 44, \quad 48\] Goal: characterize the top 5 percent of light bulbs.

  • Build the likelihood function assuming an exponential distribution.
  • Compute the likelihood that \(\mu = 25\).
  • Compute the likelihood that \(\mu = 50\).

Log Likelihood

Logarithms

You know that logarithms make large numbers smaller. More precisely, \[\ln(x) < x, \quad x > 1\]

Example: \(\ln(1234) \approx 7.1180\)

Did you know that logarithms make small numbers larger (in size). More precisely, \[|\ln(x)| > x, \quad 0 < x < 1\]

Example: \(|\ln(0.1234)| \approx 2.0923\)

From pre-calculus, recall the properties of logarithms: \[\ln(AB) = \ln(A) + \ln(B), \quad \ln\left(\displaystyle\frac{A}{B}\right) = \ln A - \ln B, \quad \ln(A^{c}) = c\ln A\]

For modeling with the exponential distribution, we saw that the likelihood function was

\[L\left(\lambda; \{x_{i}\}_{i=1}^{n}\right) = \displaystyle\prod_{i=1}^{n} f_{X}(x_{i}) = \lambda^{n}e^{-\lambda\sum x_{i}}\]

We take the natural logarithm to compute the log-likelihood function

\[\ell\left(\lambda; \{x_{i}\}_{i=1}^{n}\right) = \ln L\left(\lambda; \{x_{i}\}_{i=1}^{n}\right) = n\ln\lambda - \lambda\displaystyle\sum_{i=1}^{n} x_{i}\]

  • Compute the log-likelihood that \(\mu = 25\).
  • Compute the log-likelihood that \(\mu = 50\).

Visuals

simulation a better simulation

Looking Ahead

  • WHW9

  • Exam 2, Mon., Apr. 10

    • more information in weekly announcement