
《统计学习方法》——隐马尔可夫模型#习题解答#
统计学习方法HMM 隐马尔可夫模型 习题解答
引言
这是《统计学习方法》第二版的读书笔记,由于包含书上所有公式的推导和大量的图示,因此文章较长,正文分成三篇,以及课后习题解答,在习题解答中用Numpy实现了维特比算法和前向后向算法。
- 《统计学习方法》——隐马尔可夫模型(上)
- 《统计学习方法》——隐马尔可夫模型(中)
- 《统计学习方法》——隐马尔可夫模型(下)
- 《统计学习方法》——隐马尔可夫模型#习题解答#
本文就是习题解答。
习题
下面用编程的方法求解习题中的答案。
习题10.1
给定盒子和球组成的隐马尔可夫模型 λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π),其中,
A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = ( 0.2 , 0.4 , 0.4 ) T A=\left[\begin{array}{ccc} 0.5&0.2&0.3 \\ 0.3&0.5&0.2 \\ 0.2&0.3&0.5 \end{array}\right], \quad B=\left[\begin{array}{cc} 0.5&0.5 \\ 0.4&0.6 \\ 0.7&0.3 \end{array}\right], \quad \pi=(0.2,0.4,0.4)^T A= 0.50.30.20.20.50.30.30.20.5 ,B= 0.50.40.70.50.60.3 ,π=(0.2,0.4,0.4)T
设 T = 4 T=4 T=4, O = ( 红 , 白 , 红 , 白 ) O=(红,白,红,白) O=(红,白,红,白),试用后向算法计算 P ( O ∣ λ ) P(O|\lambda) P(O∣λ)。
根据算法10.3来实现后向算法backward
。
class HMM:
def __init__(self, v, init_prob=None, trans_mat=None, emission_prob=None, verbose=False):
"""
:v: (list) 所有可能的观测 all possible observation symbols
:init_prob: 初始概率 π initial distribution shape: (N,1)
:trans_mat: 状态转移概率矩阵 A transition probs shape: (N,N)
:emission_prob: 观测概率矩阵 B emission probs shape: (N,M)
"""
self.init_prob = init_prob
self.trans_mat = trans_mat
self.emission_prob = emission_prob
self.verbose = verbose
self.N = None # 可能的状态个数 number of states of the model
self.M = len(v) # 可能的观测个数 number of observations of the model
if self.emission_prob is not None:
self.N = self.emission_prob.shape[0]
# construct v
self.v = {symbol:i for i,symbol in enumerate(v)}
def backward(self, observations):
"""
:observations: (list) 观测
"""
assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
seq_len = len(observations)
beta = np.ones((self.N,1))
# t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
# Assuming t starts from zero
for t in range(seq_len-1, 0, -1): # [seq_len-1,1]
# (N,N)×[(N,1)*(N,1)] = (N,1)
# * 在同样的维度下是对应元素相乘
# [index]的形式是为了保持维度(keepdim)
beta = self.trans_mat @ (self.emission_prob[:,[self.v[observations[t]]]]* beta)
if self.verbose:
print(f"beta_{t}:{beta}\n")
return np.sum(self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] * beta)
这里用numpy矩阵运算加快计算速度,节省代码行数。
Q = np.array([1, 2, 3])
V = ['红', '白']
A = np.array([[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]])
B = np.array( [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]])
O = ['红', '白', '红', '白']
PI = np.array([0.2, 0.4, 0.4]).reshape(-1,1)
hmm = HMM(v=V, init_prob=PI, trans_mat=A, emission_prob=B, verbose=True)
hmm.backward(O)
输出:
beta_3:[[0.46]
[0.51]
[0.43]]
beta_2:[[0.2461]
[0.2312]
[0.2577]]
beta_1:[[0.112462]
[0.121737]
[0.104881]]
0.0600908
习题10.2
给定盒子和球组成的隐马尔可夫模型 λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π),其中,
A = [ 0.5 0.1 0.4 0.3 0.5 0.2 0.2 0.2 0.6 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = ( 0.2 , 0.3 , 0.5 ) T A=\left[ \begin{array}{ccc} 0.5&0.1&0.4 \\ 0.3&0.5&0.2 \\ 0.2&0.2&0.6 \end{array}\right], \quad B=\left[ \begin{array}{cc} 0.5&0.5 \\ 0.4&0.6 \\ 0.7&0.3 \end{array}\right], \quad \pi=(0.2,0.3,0.5)^T A= 0.50.30.20.10.50.20.40.20.6 ,B= 0.50.40.70.50.60.3 ,π=(0.2,0.3,0.5)T
设 T = 8 T=8 T=8, O = ( 红 , 白 , 红 , 红 , 白 , 红 , 白 , 白 ) O=(红,白,红,红,白,红,白,白) O=(红,白,红,红,白,红,白,白),试用前向后向概率计算 P ( i 4 = q 3 ∣ O , λ ) P(i_4=q_3|O,\lambda) P(i4=q3∣O,λ)。
类似上一题,先根据算法10.2来实现前向算法forward
。这里为了计算任意时刻的状态,需要保存前后向算分的中间结果,因此同时修改上面实现的后向算法。最后利用公式 ( 10.24 ) (10.24) (10.24)计算该题的答案。
import numpy as np
class HMM:
def __init__(self, v, init_prob=None, trans_mat=None, emission_prob=None, verbose=False):
"""
:v: (list) 所有可能的观测 all possible observation symbols
:init_prob: 初始概率 π initial distribution shape: (N,1)
:trans_mat: 状态转移概率矩阵 A transition probs shape: (N,N)
:emission_prob: 观测概率矩阵 B emission probs shape: (N,M)
"""
self.init_prob = init_prob
self.trans_mat = trans_mat
self.emission_prob = emission_prob
self.verbose = verbose
self.N = None # 可能的状态个数 number of states of the model
self.M = len(v) # 可能的观测个数 number of observations of the model
if self.emission_prob is not None:
self.N = self.emission_prob.shape[0]
# construct v
self.v = {symbol:i for i,symbol in enumerate(v)}
def backward(self, observations):
"""
:observations: (list) 观测
"""
assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
seq_len = len(observations)
betas = [np.ones((self.N,1))]
# t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
# Assuming t starts from zero
for t in range(seq_len-1, 0, -1): # [T-1,1]
# (N,N)×[(N,1)*(N,1)] = (N,1)
# * 在同样的维度下是对应元素相乘
# [index]的形式是为了保持维度(keepdim)
beta = self.trans_mat @ (self.emission_prob[:,[self.v[observations[t]]]]* betas[-1])
betas.append(beta)
if self.verbose:
print(f"beta_{t}:{beta}\n")
# 变成正序
betas.reverse()
self.betas = betas
return np.sum(self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] * beta)
def forward(self, observations):
"""
:observations: (list) 观测
"""
assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
seq_len = len(observations)
alphas = [ self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] ]
# t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
# Assuming t starts from zero
for t in range(seq_len-1): # [0, T-2]
# self.trans_mat * alphas[-1] (N,N)*(N,1) 右边(N,1)先广播成(N,N),再与左边的(N,N)按元素相乘
# self.emission_prob[:,self.v[observations[t+1]]] self.v前面没加[],得到的shape为(N,)
# 左边(N,N) * (N,) 得到(N,N) 为左边第0列所有元素乘以右边第0个元素;第1列元素都乘以右边第1个元素
# (N,N)按axis=0求和 变成 (N,) reshape成(N,1)确保alpha的维度不变
alpha = (self.trans_mat * alphas[-1] * self.emission_prob[:,self.v[observations[t+1]]]).sum(0).reshape(-1,1)
alphas.append(alpha)
if self.verbose:
print(f"alpha_{t+1}:{alpha}\n")
self.alphas = alphas
return np.sum(alphas)
def calc_t_i_prob(self, t, i, observations):
"""
利用前向后向概率,计算在时刻t处于状态qi的概率
"""
self.forward(observations)
ob_prob = self.backward(observations)
return (self.alphas[t-1][i-1] * self.betas[t-1][i-1]) / ob_prob
Q = np.array([1, 2, 3])
V = ['红', '白']
A = np.array([[0.5, 0.1, 0.4], [0.3, 0.5, 0.2], [0.2, 0.2, 0.6]])
B = np.array([[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]])
O = ['红', '白', '红', '红', '白', '红', '白', '白']
PI = np.array([0.2, 0.3, 0.5]).reshape(-1,1)
hmm = HMM(v=V, init_prob=PI, trans_mat=A, emission_prob=B, verbose=True)
print(hmm.calc_t_i_prob(4, 3, O) )
输出:
alpha_1:[[0.078 ]
[0.084 ]
[0.0822]]
alpha_2:[[0.04032 ]
[0.026496]
[0.068124]]
alpha_3:[[0.0208668 ]
[0.01236192]
[0.04361112]]
alpha_4:[[0.0114321 ]
[0.01019392]
[0.01109573]]
alpha_5:[[0.00549669]
[0.00338373]
[0.00928834]]
alpha_6:[[0.00281056]
[0.00245952]
[0.00253453]]
alpha_7:[[0.00132502]
[0.00121063]
[0.00094105]]
beta_7:[[0.43]
[0.51]
[0.4 ]]
beta_6:[[0.1861]
[0.2415]
[0.1762]]
beta_5:[[0.105521]
[0.100883]
[0.111934]]
beta_4:[[0.04586531]
[0.05280909]
[0.04280618]]
beta_3:[[0.02556442]
[0.02343448]
[0.02678985]]
beta_2:[[0.01482964]
[0.01227214]
[0.01568294]]
beta_1:[[0.00632569]
[0.00684706]
[0.00577855]]
[0.53695182]
习题10.3
在习题10.1中,试用维特比算法求最优路径 I ∗ = ( i 1 ∗ , i 2 ∗ , i 3 ∗ , i 4 ∗ ) I^*=(i_1^*,i_2^*,i_3^*,i_4^*) I∗=(i1∗,i2∗,i3∗,i4∗)。
维特比算法的实现基于书中的公式即可,最后得到的完整代码为:
class HMM:
def __init__(self, v, init_prob=None, trans_mat=None, emission_prob=None, verbose=False):
"""
:v: (list) 所有可能的观测 all possible observation symbols
:init_prob: 初始概率 π initial distribution shape: (N,1)
:trans_mat: 状态转移概率矩阵 A transition probs shape: (N,N)
:emission_prob: 观测概率矩阵 B emission probs shape: (N,M)
"""
self.init_prob = init_prob
self.trans_mat = trans_mat
self.emission_prob = emission_prob
self.verbose = verbose
self.N = None # 可能的状态个数 number of states of the model
self.M = len(v) # 可能的观测个数 number of observations of the model
if self.emission_prob is not None:
self.N = self.emission_prob.shape[0]
# construct v
self.v = {symbol:i for i,symbol in enumerate(v)}
def backward(self, observations):
"""
:observations: (list) 观测
"""
assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
seq_len = len(observations)
betas = [np.ones((self.N,1))]
# t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
# Assuming t starts from zero
for t in range(seq_len-1, 0, -1): # [T-1,1]
# (N,N)×[(N,1)*(N,1)] = (N,1)
# * 在同样的维度下是对应元素相乘
# [index]的形式是为了保持维度(keepdim)
beta = self.trans_mat @ (self.emission_prob[:,[self.v[observations[t]]]]* betas[-1])
betas.append(beta)
if self.verbose:
print(f"beta_{t}:{beta}\n")
# 变成正序
betas.reverse()
self.betas = betas
return np.sum(self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] * beta)
def forward(self, observations):
"""
:observations: (list) 观测
"""
assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
seq_len = len(observations)
alphas = [ self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] ]
# t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
# Assuming t starts from zero
for t in range(seq_len-1): # [0, T-2]
# self.trans_mat * alphas[-1] (N,N)*(N,1) 右边(N,1)先广播成(N,N),再与左边的(N,N)按元素相乘
# self.emission_prob[:,self.v[observations[t+1]]] self.v前面没加[],得到的shape为(N,)
# 左边(N,N) * (N,) 得到(N,N) 为左边第0列所有元素乘以右边第0个元素;第1列元素都乘以右边第1个元素
# (N,N)按axis=0求和 变成 (N,) reshape成(N,1)确保alpha的维度不变
alpha = (self.trans_mat * alphas[-1] * self.emission_prob[:,self.v[observations[t+1]]]).sum(0).reshape(-1,1)
alphas.append(alpha)
if self.verbose:
print(f"alpha_{t+1}:{alpha}\n")
self.alphas = alphas
return np.sum(alphas)
def calc_t_i_prob(self, t, i, observations):
"""
利用前向后向概率,计算在时刻t处于状态qi的概率
"""
self.forward(observations)
ob_prob = self.backward(observations)
return (self.alphas[t-1][i] * self.betas[t-1][i]) / ob_prob
def viterbi(self, observations):
"""
:observations: (list) 观测
"""
assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
seq_len = len(observations)
deltas = [ self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] ] # δ (N,1)
phis = np.zeros((seq_len, self.N), dtype=int) # Ψ
for t in range(1, seq_len):
# self.trans_mat * alphas[-1] (N,N)*(N,1) 右边(N,1)先广播成(N,N),再与左边的(N,N)按元素相乘
trans_with_weight = self.trans_mat * deltas[-1]
phis[t] = trans_with_weight.argmax(0)
# 确保delta的维度一致
# trans_with_weight.max(0) (N,)
# self.emission_prob[:,self.v[observations[t]]] (N,)
# 按元素相乘后再转成想要的维度
deltas.append( (trans_with_weight.max(0) * self.emission_prob[:,self.v[observations[t]]]).reshape(-1,1) )
if self.verbose:
print(f"delta_{t+1}:{deltas[-1].squeeze()}, phis_{t+1}:{phis[t]}")
max_prob = np.max(deltas[-1])
i_T = np.argmax(deltas[-1])
its = [i_T]
for t in range(seq_len-1, 0, -1):
its.append(phis[t, its[-1]])
its.reverse()
return max_prob, its
Q = np.array([1, 2, 3])
V = ['红', '白']
A = np.array([[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]])
B = np.array( [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]])
O = ['红', '白', '红', '白']
PI = np.array([0.2, 0.4, 0.4]).reshape(-1,1)
hmm = HMM(v=V, init_prob=PI, trans_mat=A, emission_prob=B, verbose=True)
max_prob, its = hmm.viterbi(O)
print(max_prob) # 最大概率
print(its) # 索引从0开始,实际上是状态3,2,2,2
输出:
delta_2:[0.028 0.0504 0.042 ], phis_2:[2 2 2]
delta_3:[0.00756 0.01008 0.0147 ], phis_3:[1 1 2]
delta_4:[0.00189 0.003024 0.002205], phis_4:[0 1 2]
0.0030239999999999993
[2, 1, 1, 1]
习题10.4
试用前向概率和后向概率推导 P ( O ∣ λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) , t = 1 , 2 , ⋯ , T − 1 P(O|\lambda)=\sum_{i=1}^N\sum_{j=1}^N\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j),\quad t=1,2,\cdots,T-1 P(O∣λ)=i=1∑Nj=1∑Nαt(i)aijbj(ot+1)βt+1(j),t=1,2,⋯,T−1
实际在上文公式 ( 10.22 ) (10.22) (10.22)后面就有推导。
习题10.5
比较维特比算法中变量 δ \delta δ的计算和前向算法中变量 α \alpha α的计算的主要区别。
维特比算法:
(1)初始化
δ 1 ( i ) = π i b i ( o 1 ) , i = 1 , 2 , ⋯ , N \delta_1(i)=\pi_ib_i(o_1),i=1,2,\cdots,N δ1(i)=πibi(o1),i=1,2,⋯,N
(2)递推,对 t = 2 , 3 , ⋯ , T t=2,3,\cdots,T t=2,3,⋯,T
δ t ( i ) = max 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] b i ( o t ) , i = 1 , 2 , ⋯ , N \delta_t(i)=\max_{1 \leqslant j \leqslant N} [\delta_{t-1}(j)a_{ji}]b_i(o_t), i=1,2,\cdots,N δt(i)=1⩽j⩽Nmax[δt−1(j)aji]bi(ot),i=1,2,⋯,N
前向算法
(1)初值
α 1 ( i ) = π i b i ( o i ) , i = 1 , 2 , ⋯ , N \alpha_1(i)=\pi_ib_i(o_i),i=1,2,\cdots,N α1(i)=πibi(oi),i=1,2,⋯,N
(2)递推,对 t = 1 , 2 , ⋯ , T − 1 t=1,2,\cdots,T-1 t=1,2,⋯,T−1:
α t + 1 ( i ) = [ ∑ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) , i = 1 , 2 , ⋯ , N \alpha_{t+1}(i)=\left[\sum_{j=1}^N \alpha_t(j) a_{ji} \right]b_i(o_{t+1}),i=1,2,\cdots,N αt+1(i)=[j=1∑Nαt(j)aji]bi(ot+1),i=1,2,⋯,N
主要区别:
通过比较两个变量的计算公式,可以看到区别有:
- 前向算法计算到时刻 t t t的部分观测序列,且为某一状态的概率;
- 维特比算法计算长度为 t t t的观测序列中,得到当前观测序列的最大可能概率;
- 前向算法包含到 t − 1 t-1 t−1时刻所有可能的观测序列概率和;
- 维特比算法包含出现观测序列中所有 t t t个时刻相应观测的最大概率;
更多推荐
所有评论(0)