PRML読書会第二回まとめpart5

Chapter 2 :確率分布
Gaussian distribution

最尤推定

\mathrm{Observed \; data}\; : \; \mathbb{X} = ( \mathbb{x}_{1}, ... , \mathbb{x}_{N} )^{\mathrm{T}}
\mathrm{error \; function}\; : \; -\log p(\mathbb{X} | \mathbb{\mu} , \mathbb{\Sigma} ) = \frac{1}{2} \sum_{n=1}^{N}(\mathbb{x}_{n} - \mathbb{\mu})^{\mathrm{T}}\mathbb{\Sigma}^{-1}(\mathbb{x}_{n} - \mathbb{\mu})
\downarrow minimize
\mathbb{\mu}_{ML} = \frac{1}{N}\sum_{n=1}{N}\mathbb{x}_{n}
\mathbb{\Sigma}_{ML} = \frac{1}{N}(\mathbb{x}_{n} - \mathbb{\mu}_{ML})(\mathbb{x}_{n} - \mathbb{\mu}_{ML})^{\mathrm{T}}

  • \; \mathbb{E} \left[ \mathbb{\mu}_{ML} \right] = \mathbb{\mu}
  • \; \mathbb{E} \left[ \mathbb{\Sigma}_{ML} \right] = \frac{N-1}{N}\mathbb{\Sigma}

不偏推定ではない!

逐次推定

オンライン分野、データを処理ごとに捨てる。

  • \mathbb{\mu}_{ML}^{(N)} = \mathbb{\mu}_{ML}^{(N-1)} + \frac{1}{N} \left( \mathbb{x}_{N} - \mathbb{\mu}_{ML}^{\left( N-1 \right) } \right)

N-1個のデータ:\;\mathbb{\mu}_{ML}^{(N-1)}

  • \mathbb{x}_{N}を追加:\;\frac{1}{N} \left( \mathbb{x}_{N} - \mathbb{\mu}_{ML}^{\left( N-1 \right) } \right) を加えて修正
一般化 Robbins-Monro algorithm
  • θ , z : r.v.s with p(z , θ)
  • f(\theta) \equiv \mathbb{E} \left[ z | \theta \right] \;\; \mathrm{regression \; function}
  • (\int z p(z | \theta) dz)
  • z:観測値
  • θ:データ

仮定

  • \mathbb{E} \left[ (z - f)^{2} | \theta \right] < \infty
  • \theta > \theta^{*} \Rightarrow f(\theta) > 0 \;,\;\; \theta < \theta^{*} \Rightarrow f(\theta) < 0
  • \lim_{N \to \infty} a_{N} = 0 \;,\;\; \sum a_{N} = \infty \;,\;\; \sum a_{N}^{2} < \infty
  • \theta^{(N)} = \theta^{\left( N-1 \right) } - a_{N-1}z \left( \theta^{ \left( N-1 \right) } \right)

最尤問題は零点探査

  • - \frac{\partial}{\partial \theta} \left{ \frac{1}{N} \sum_{n=1}^{N} \log p \left( x_{n} | \theta \right) \right} = 0
  • - \lim_{N \to \infty} \frac{1}{N} \sum \frac{\partial}{\partial \theta} \log p \left( x_{n} | \theta \right) = \mathbb{E}_{x} \left[ - \frac{\partial}{\partial \theta} \log p \left( x | \theta \right) \right]
  • \theta^{(N)} = \theta^{(N-1)} - a_{N-1} \frac{\partial}{\partial\theta^{(N-1)}} \left[ - \log p(x_{N} | \theta^{(N-1)} ) \right]
  • \; \theta^{(N)} \rightarrow \mu_{ML}^{(N)}
  • \; x = - \frac{\partial}{\partial \mu_{ML}} \log p (x | \mu_{ML},\sigma^{2}) = - \frac{1}{\sigma^{2}}(x - \mu_{ML})
  • \; a_{N} = \frac{\sigma^{2}}{N}

Bayes推定

Observed data \mathbb{X} = {x_{1}, ... , x_{N}}

  1. given \sigma^{2} , unknown \mu
  2. given \mu , unknown \sigma^{2}
  3. unknown \mu , \sigma^{2}
(1)given \sigma^{2} , unknown \mu

p(\mathbb{x} | \mu ) = \prod_{n = 1}^{N} p \left( x_{n} | \mu \right) = \frac{1}{(sqrt{2\pi\sigma^{2}})^{N}} \exp \left{ - \frac{1}{2\sigma^{2}} \sum_{n=1}^{N} \left( x_{n} - \mu \right) \right}
conjugate prior

  • p(\mu) = \mathcal{N}(\mu | \mu_{0} , \sigma_{0}^{2})

posterror

  • p(\mu | \mathbb{X} ) = \mathcal{N}(\mu | \mu_{N} , \sigma^{2}_{N})

where ...

  • \mu_{N} = \frac{\sigma^{2}}{N \sigma^{2}_{0}+ \sigma^{2}}\;\mu_{0} + \frac{N\sigma^{2}_{0}}{N \sigma^{2}_{0}+ \sigma^{2}}\;\mu_{ML}
  • \frac{1}{\sigma^{2}_{N}} = \frac{1}{\sigma^{2}_{0}} + \frac{N}{\sigma^{2}}
  • \mu_{ML} = \frac{1}{N}\sum x_{n}

指数部を平方完成すれば出来る。

(2)given \mu , unknown \sigma^{2}

liklihood : \lambda \equiv \frac{1}{\sigma^{2}}

  • p(\mathbb{X} | \lambda ) \propto \lambda^{\frac{N}{2}} \exp \left{ - \frac{\lambda}{2} \sum_{n-1}^{N} \left( x_{n} - \mu \right)^{2} \right}

conjugate prior

  • Gam(\lambda | a, b) = \frac{1}{\Gamma \left( a \right) }b^{a}\lambda^{a-1} exp^{ \left( -b\lambda \right)}

posterror

  • p( \lambda | \mathbb{X} ) = Gam(\lambda | a_{N} , b_{N})

where...

  • a_{N} = a + \frac{N}{2}
  • b_{N} = b + \frac{N}{2}\sigma_{ML}^{2}
  • \sigma_{ML}^{2} = \frac{1}{N}\sum_{n=1}^{N}\left( x_{n} -\mu \right)
  • > predictive distribution : Stuedent's t
(3)unknown \mu , \sigma^{2}

prior

  • p(\mu,\lambda) = \mathcal{N} \left( \mu | \mu_{0} , \left( \beta \lambda \right)^{-1} \right)  Gam\left( \lambda | a, b\right)

Gaussian-gamma distribution
多変量も単なる次元拡張

周期関数

e.g.) 時刻、日付など
周期2πの分布p(θ)

  • p(\theta) \geq 0
  • \int_{0}^{2\pi} p(\theta) d\theta = 1
  • p(\theta + 2\pi) = p(\theta)
von Mises distribution

p \left( \theta | \theta_{0} , m \right)  = \frac{1}{2\pi I_{0} \left( m \right) } exp ^{\left{ m \cos \left( \theta - \theta_{0} \right) \right}}

  • \theta_{0} : 平均
  • m :集中度(精度)
  • I_{0}(m) : ベッセル関数

最尤解

\theta_{0}^{ML} = \tan ^{-1} \left( \frac {\Sigma_{n} \sin \theta_{n}}{\Sigma_{n} \cos \theta_{n}} \right)
A(m_{ML}) = \left( \frac{1}{N}\sum_{n=1}^{N} \cos \theta_{n} \right) \cos \theta_{0}^{ML} + \left( \frac{1}{N} \sum_{n=1}{N} \sin \theta_{n} \right) \sin \theta_{0}^{ML}
A(m) = \frac{I_{1}(m)}{I_{0}(m)}
その他

  • histgram
  • 周辺化
  • 混合分布で単峰化

混合ガウス分布

p(\mathbb{x}) = \sum_{k=1}^{K}\pi_{k}\mathcal{N}(\mathbb{x} | \mathbb{\mu}_{k} , \mathbb{\Sigma}_{k})
\sum_{k=1}^{K} \pi_{k} = 1 \; , \;\; 0 \leq \pi_{k} \leq 1
prior
\pi_{k} = p(k)
\mathcal{N}(\mathbb{x}|\mathbb{\mu}_{k} , \mathbb{\Sigma}_{k}) = p(\mathbb{x} | \mathbb{k})
p(\mathbb{x}) = \sum_{k=1}^{K}p(k)p(\mathbb{x}|k)

  • r_{k} \equiv p(k | \mathbb{x}) : 負担率
  • =\frac{p(k)p(\mathbb{x} | k)}{p(\mathbb{x})}(posterror)

詳しくは9章