(population) sample space $Ω$
(population size) $N$ or $\infty$ .
(population random variable) $X : Ω \to R$
(population distribution) $X \sim P_{θ}$
(parameter $θ$ ) A fixed (unknown usually) value on which the population distribution depends.
- $μ = E [X]$ (population mean)
- $σ^{2} = Var (X)$ (population variance)
(sample) $X_{1}, \dots, X_{n}$ where each $X_{i} \sim i.i.d. P_{θ}$
- (e.g.9.1) For each $i = 1, \dots, n$ , $E [X_{i}] = μ$ .
- ${x_{1}, \dots, x_{n}}$ (observed data values, realizations)
  - The joint PMF $p (x_{1}, \dots, x_{n})$ is called the sample distribution of $X_{1}, \dots, X_{n}$
    - (9.3) $p (x_{1}, \dots, x_{n}) = i = 1 \prod n p (x_{i})$ .
    - (Similar holds for the continuous case with PDF $f (x)$ ).
A random variable $T = T (X_{1}, \dots, X_{n})$ is called a statistic if:
- $T$ is a function of $X_{1}, \dots, X_{n}$ , and
- $T$ does not depend on any unknown parameters.
An estimator (אומד) (of an unknown parameter $θ$ ) is a statistic denoted by $\hat{θ}$ or $T$ , that is used to estimate $θ$ .
- (For any given $x_{1}, \dots, x_{n}$ observed values) $T = T (x_{1}, \dots, x_{n})$ is called an estimate (אומדן) of the parameter $θ$ .
- $∣ T - θ ∣$ is called the error of the estimate.
- The bias of an estimator $\hat{θ}$ is defined as $Bias (\hat{θ}) = E [\hat{θ}] - θ$ .
  - An estimator $\hat{θ}$ is called an unbiased estimator of $θ$ if $Bias (\hat{θ}) = 0$ .
    - If $\hat{θ}$ is an unbiased estimator of $θ$ , then $g (\hat{θ})$ is an unbiased estimator of $g (θ)$ for any linear function $g$ .
  - An estimator $\hat{θ}$ is called a biased estimator of $θ$ if $Bias (\hat{θ}) \neq = 0$ .
- The mean squared error (MSE) of an estimator $T$ of $θ$ is defined as $MSE (T, θ) = E [(T - θ)^{2}]$ .
  - $MSE (T, θ) = Var (T) + (Bias (T))^{2}$
  - An estimator $T_{2}$ is better than $T_{2}$ (of the same parameter $θ$ ) if $MSE (T_{2}, θ) < MSE (T_{1}, θ)$ .
  - If $T_{1}$ and $T_{2}$ are unbiased estimators of $θ$ , then:
    - $Var (T_{1}) < Var (T_{2}) ⟹ MSE (T_{1}, θ) < MSE (T_{2}, θ)$ .
Let $X_{1}, \dots, X_{n}$ be a random sample from a population with pmf/pdf $f (x; θ)$ with unknown parameter $θ$ .
- The likelihood function (of given realizations $x_{1}, \dots, x_{n}$ ) is a function of $θ$ defined by $L (θ) = f (x_{1}, \dots, x_{n}; θ)$ .
- $ℓ (θ) = lo g L (θ)$ is the log-likelihood function.
- An estimator $T = T (X_{1}, \dots, X_{n})$ is called a maximum likelihood estimator (MLE) of $θ$ if $\hat{θ} = ar g θ max L (θ)$ , or equivalently, if for all $L (\hat{θ}) \geq L (θ)$ for all $θ$ .
  - (9.20) If $\hat{θ}$ is a MLE of $θ$ , then $g (\hat{θ})$ is a MLE of $g (θ)$ for any one-to-one function $g$ .
A confidence interval (for an unknown parameter $θ$ ) with confidence level $1 - α$ is an interval $[A, B]$ such that $P (A \leq θ \leq B) = 1 - α$ .
- $P (- ε < \hat{θ} - θ < ε) = 1 - α$
- Example:
  - $X \sim N (μ, σ^{2})$
  - $1 - α = 0.95$ (confidence level)
  - parameter: $θ = μ$
  - estimator: $\hat{θ} = \overset{ˉ}{X}$ (sample mean)
  - $P (- ε < \overset{ˉ}{X} - μ < ε) = 0.95$
  - $P (\frac{ε}{2} \leq \frac{X ˉ - μ}{2} \leq \frac{ε}{2}) = 0.95$
  - $Φ (\frac{ε}{2}) - Φ (- \frac{ε}{2}) = 0.95$
  - $Φ (\frac{ε}{2}) = 0.975$
  - $\frac{ε}{2} = z_{0.975}$
  - $ε = 2 z_{0.975} = 2 \cdot 1.96 = 3.92$
  - $[A, B] = [\overset{μ}{^} - ε, \overset{μ}{^} + ε] = [\overset{ˉ}{X} - 3.92, \overset{ˉ}{X} + 3.92]$ (confidence interval for $μ$ with confidence level $1 - α = 0.95$ ).
    - For multiple number of samples, $0.975$ of the cases, the confidence interval will contain the true value of the parameter $μ$ .

Examples

The statistic $\overset{ˉ}{X} = \frac{1}{n} i = 1 \sum n X_{i}$ is called the sample mean of $X_{1}, \dots, X_{n}$ .
- (9.8) $E [\overset{ˉ}{X}] = E [X_{i}] = μ$ for all $i$ . (i.e. the expectation of the sample mean $\overset{ˉ}{X}$ is equal to the population mean $μ$ and to the expectation of any $X_{i}$ ).
- The sample mean $\overset{μ}{^} = \overset{ˉ}{X}$ is an unbiased estimator of $μ$ . (the population mean $μ$ ).
- $Var (\overset{ˉ}{X}) = \frac{σ ^{2}}{n}$ (i.e. the variance of the sample mean $\overset{ˉ}{X}$ is equal to the population variance $σ^{2}$ divided by the sample size $n$ ).
The statistic $S^{2} = \frac{1}{n - 1} i = 1 \sum n (X_{i} - \overset{ˉ}{X})^{2}$ is called the sample variance of $X_{1}, \dots, X_{n}$ , and $S$ is the called sample standard deviation.
- (9.10) $\overset{σ}{^}^{2} = S^{2}$ is an unbiased estimator population variance of $σ^{2}$ (unknown mean $μ$ )
- (9.11) $S^{2} = \frac{1}{n - 1} (i = 1 \sum n X_{i}^{2} - n \overset{ˉ}{X}^{2})$
(9.9) If $μ$ is known, then the statistic $S^{2} = \frac{1}{n} i = 1 \sum n (X_{i} - μ)^{2}$ (denoted also by $\overset{σ}{^}^{2}$ ) is an unbiased estimator of the population variance $σ^{2}$ .

Hypothesis Testing

A pivotal quantity is a function $Q (X_{1}, \dots, X_{n}, θ)$ such that the distribution of $Q$ does not depend on the unknown parameter $θ$ .
- (Example: $Q = \frac{X ˉ - μ}{σ / n}$ )
The null hypothesis $H_{0}$
The alternative hypothesis $H_{1}$
The test statistic $T = T (X_{1}, \dots, X_{n})$ is a statistic used to test the null hypothesis $H_{0}$ .
- The rejection region (or critical region) is the set $C$ of values of the test statistic $T$ for which the null hypothesis $H_{0}$ is rejected.
- The acceptance region $\overset{ˉ}{C}$ is the set of values of the test statistic $T$ for which the null hypothesis $H_{0}$ is accepted.
- $C \cap \overset{ˉ}{C} = \emptyset$
- $T$ is called a critical value
Type I error: Rejecting $H_{0}$ when $H_{0}$ is true.
- $P_{H_{0}} (C) = α$ (significance level)
Type II error: Accepting $H_{0}$ when $H_{1}$ is true.
- $P_{H_{1}} (\overset{ˉ}{C}) = β$
- If $H_{1}$ is composite, then $β$ is a function of $θ$ .
$P_{H_{1}} (C) = π = 1 - β$ (power)
$L (θ_{0} ∣ x)$
$Λ (x) = \frac{L ( θ _{0} ∣ x )}{L ( θ _{1} ∣ x )}$ is the likelihood ratio (of $H_{0}$ and $H_{1}$ ).
$λ (x_{1}, \dots, x_{n}) = \frac{P _{1} ( x _{1} , \dots , x _{n} )}{P _{0} ( x _{1} , \dots , x _{n} )}$ is the likelihood ratio (of hypotheses $H_{0} : P = P_{0}$ and $H_{1} : P = P_{1}$ , where $P$ is the population distribution).
(Neyman-Pearson lemma) If $C = {(x_{1}, \dots, x_{n}) : λ (x_{1}, \dots, x_{n}) \geq K}$ and $α = P_{H_{0}} (C)$ , then $C$ is the most powerful test of significance level at most $α$ for testing $H_{0}$ against $H_{1}$ .

Explorer

Statistics

Examples

Hypothesis Testing

Table of Contents