PRML読書会 第二回まとめPart4

Chapter 2 :確率分布
Gaussian distribution

基本事項

  • X 〜 N(\mu, \sigma ^{2})
  • P(X = x | \mu, \sigma ^{2}) = p(x | \mu,\sigma^{2}) =  \mathcal{N}(x | \mu,\sigma^{2})
  • \mathcal{N}(x | \mu,\sigma^{2}) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}
    • D次元に拡張すると…
  • \mathcal{N} (x | \mathbb{\mu} , \mathbb{\Sigma} ) = \frac{1}{(2\pi)^{\frac{D}{2}}} \; \frac{1}{ |\mathbb{\Sigma}|^{2}}\; \exp \left{ - \frac{(\mathbb{x} - \mathbb{\mu})^{\mathrm{T}}\mathbb{\Sigma}^{-1}(\mathbb{x} - \mathbb{\mu})}{2}\right}
  • \mathbb{x} \in \mathbb{R}^{D}
  • \mathbb{\mu} \in \mathbb{R}^{D} 平均
  • \mathbb{\Sigma} \in M_{D}(\mathbb{R}) 共集合 ; 対称, 正定値(実数固有値を持ち、それらが正。二次形式は楕円体になる。→p.82)
パラメーターの数

\mathbb{\Sigma} … そのまま 1/2 D(D+1) 斜め楕円
     対角  D      楕円
     \sigma^{2}I  1      円
μ… D
並行列等の計算スピード ←→ モデル表現力・発見能力

条件付・周辺

\mathcal{N}( x | \mathbb{\mu} \mathbb{\Sigma} )をjoint distributionとみる。

  • \mathbb{x} = \left( \mathbb{x}_{a} \\ \mathbb{x}_{b} \right)
  • \mathbb{\mu} = \left( \mathbb{\mu}_{a} \\ \mathbb{\mu}_{b} \right)
  • \mathbb{\Sigma} = \left(\begin{array}{cc}  \mathbb{\Sigma}_{aa} & \mathbb{\Sigma}_{ab} \\ \mathbb{\Sigma}_{ba} &\mathbb{\Sigma}_{bb} \end{array}\right)
  • \mathbb{\Lambda} = \mathbb{\Sigma}^{-1} = \left(\begin{array}{cc} \mathbb{\Lambda}_{aa} &\mathbb{\Lambda}_{ab} \\ \mathbb{\Lambda}_{ba} & \mathbb{\Lambda}_{bb} \end{array}\right)

このとき、

  • \mathcal{P}(\mathbb{x}_{a} | \mathbb{x}_{b} ) = \mathcal{N} (\mathbb{x} | \mathbb{\mu}_{a | b} ,\mathbb{\Lambda}^{-1}_{aa})
  • \mathbb{\mu}_{a | b} = \mathbb{\mu}_{a} - \mathbb{\Lambda}^{-1}_{aa}\mathbb{\Lambda}_{ab}(\mathbb{x}_{b} - \mathbb{\mu}_{b})
  • p(\mathbb{x}_{a}) = \mathcal{N}(\mathbb{x}_{a} | \mathbb{\mu}_{a},\mathbb{\Sigma}_{aa} )
  • -\frac{1}{2}(\mathbb{x} - \mathbb{\mu} )^{\mathrm{T}} \Sigma{-1} (\mathbb{x} - \mathbb{\mu}) = \left{-\frac{1}{2}\mathbb{x}^{\mathrm{T}} \Sigma^{-1}\mathbb{x} + \mathbb{x}^{\mathrm{T}}\Sigma^{-1}\mathbb{\mu} +\mathrm{const} \\-\frac{1}{2}(\mathbb{x}_{a} - \mathbb{\mu}_{a} )^{\mathrm{T}} \Lambda_{aa} (\mathbb{x}_{a} - \mathbb{\mu}_{a}) \\-\frac{1}{2}(\mathbb{x}_{a} - \mathbb{\mu}_{a} )^{\mathrm{T}} \Lambda_{ab} (\mathbb{x}_{b} - \mathbb{\mu}_{b}) \\-\frac{1}{2}(\mathbb{x}_{b} - \mathbb{\mu}_{b} )^{\mathrm{T}} \Lambda_{ba} (\mathbb{x}_{a} - \mathbb{\mu}_{a}) \\-\frac{1}{2}(\mathbb{x}_{b} - \mathbb{\mu}_{b} )^{\mathrm{T}} \Lambda_{bb} (\mathbb{x}_{b} - \mathbb{\mu}_{b})

\mathbb{x}_{a}に関して二次形式→Gaussian dist
\mathbb{\mu}_{a|b},\mathbb{\Sigma}_{a|b}を決定すれば良い
平方完成の係数合わせをする

  • -\frac{1}{2}\mathbb{x}_{a}^{\mathrm{T}}\mathbb{\Lambda}_{aa}\mathbb{x}_{a}+\mathbb{x}_{a}^{\mathrm{T}}\mathbb{\Lambda}_{aa}\left( \mathbb{\mu}_{a}-\mathbb{\Lambda}_{aa}^{-1}\mathbb{\Lambda}_{ab} \left( \mathbb{x}_{b} - \mathbb{\mu}_{b} \right) \right) +\mathrm{const}
  • \mathbb{\Lambda}_{aa} = \mathbb{\Sigma}^{-1}_{a|b}
  • \mathbb{\mu}_{a}-\mathbb{\Lambda}_{aa}^{-1}\mathbb{\Lambda}_{ab} \left( \mathbb{x}_{b} - \mathbb{\mu}_{b} \right) = \mathbb{\mu}_{a|b}
  • \left(\begin{array}{cc} A & B \\ C & D \end{array}\right)^{-1} = \left( \begin{array}{cc} M & -MBD^{-1} \\ -D^{-1}CM & D^{-1} + D^{-1}CMBD^{-1} \end{array}\right)
  • M = (A - BD^{-1}C)^{-1}
  • \int \exp\left{ - \frac{1}{2}\left(\mathbb{x}_{b} - \mathbb{\Lambda}_{bb}^{-1}\mathbb{m}\right)^{\mathrm{T}}\mathbb{\Lambda}_{bb}\left(\mathbb{x}_{b} - \mathbb{\Lambda}_{bb}^{-1}\mathbb{m}\right)\right}d\mathbb{x}_{b}
  • \mathbb{m} = \mathbb{\Lambda}_{bb}\mathbb{\mu}_{b} - \mathbb{\Lambda}_{ba}\left(\mathbb{x}_{a} -\mathbb{\mu}_{a} \right)

残った項をまとめると

  • -\frac{1}{2}\mathbb{x}_{a}^{\mathrm{T}} \left(\mathbb{\Lambda}_{aa} - \mathbb{\Lambda}_{ab}\mathbb{\Lambda}_{bb}^{-1}\mathbb{\Lambda}_{ba} \right)\mathbb{x}_{a} +\mathbb{x}_{a}^{\mathrm{T}}\left( \mathbb{\Lambda}_{aa} - \mathbb{\Lambda}_{ab}\mathbb{\Lambda}_{bb}^{-1}\mathbb{\Lambda}_{ba} \right)\mathbb{\mu}_{a} + \mathrm{const}

p.88の図

Bayes' thm

given:

  • p(\mathbb{x}) = \mathcal{N}(\mathbb{x} | \mathbb{\mu} , \mathbb{\Lambda}^{-1} ) \; \mathbb{x} \in \mathbb{R}^{M}
  • p(\mathbb{y} | \mathbb{x} ) = \mathcal{N}(\mathbb{y} | \mathbb{Ax}+\mathbb{b},\mathbb{L}^{-1}) \; \mathbb{y} \in \mathbb{R}^{D} \; \mathbb{A} \in M_{DM}(\mathbb{R})
  • 平均が\mathbb{x}の線型関数、共分散とは独立

joint dist
p(\mathbb{z}) = p(\mathbb{x})p(\mathbb{y}|\mathbb{x}) = \mathcal{N} (\mathbb{z} | \mathbb{r} , \mathbb{R})

  • \mathbb{z} = \left( \mathbb{x} \\ \mathbb{y} \right)
  • \mathbb{r} = \left( \mathbb{\mu} \\ \mathbb{A \mu} + \mathbb{b} \right)
  • \mathbb{R} = \left( \begin{array}{cc} \mathbb{\Lambda}+\mathbb{A}^{\mathrm{T}}\mathbb{LA} & -\mathbb{A}^{\mathrm{T}}\mathbb{L} \\ -\mathbb{LA} & \mathbb{L} \end{array} \right)

\therefore p(\mathbb{y}) = \mathcal{N} \left( \mathbb{y} | \mathbb{A\mu} + \mathbb{b} , \mathbb{L}^{-1}+\mathbb{AL}^{-1}\mathbb{A}^{\mathrm{T}} \right)
p(\mathbb{x}|\mathbb{y}) = \mathcal{N}\left(\mathbb{x} |\mathbb{\Sigma} \left{ \mathbb{A}^{\mathrm{T}}\mathbb{L} \left( \mathbb{y}-\mathbb{b} \right) + \mathbb{\Lambda \mu} \right} , \mathbb{\Sigma} \right)
\mathbb{\Sigma} = (\mathbb{\Lambda} + \mathbb{A}^{\mathrm{T}}\mathbb{LA} )^{-1}

prior p(\mathbb{x})
\downarrow \; \mathbb{y}を観測
posterior p(\mathbb{x} | \mathbb{y})