引言

这是《统计学习方法》第二版的读书笔记,由于包含书上所有公式的推导和大量的图示,因此文章较长,正文分成三篇,以及课后习题解答,在习题解答中用Numpy实现了维特比算法和前向后向算法。


  1. 《统计学习方法》——隐马尔可夫模型(上)
  2. 《统计学习方法》——隐马尔可夫模型(中)
  3. 《统计学习方法》——隐马尔可夫模型(下)
  4. 《统计学习方法》——隐马尔可夫模型#习题解答#

本文就是习题解答。

习题

下面用编程的方法求解习题中的答案。

习题10.1

  给定盒子和球组成的隐马尔可夫模型 λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π),其中,

A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = ( 0.2 , 0.4 , 0.4 ) T A=\left[\begin{array}{ccc} 0.5&0.2&0.3 \\ 0.3&0.5&0.2 \\ 0.2&0.3&0.5 \end{array}\right], \quad B=\left[\begin{array}{cc} 0.5&0.5 \\ 0.4&0.6 \\ 0.7&0.3 \end{array}\right], \quad \pi=(0.2,0.4,0.4)^T A= 0.50.30.20.20.50.30.30.20.5 ,B= 0.50.40.70.50.60.3 ,π=(0.2,0.4,0.4)T

T = 4 T=4 T=4 O = ( 红 , 白 , 红 , 白 ) O=(红,白,红,白) O=(,,,),试用后向算法计算 P ( O ∣ λ ) P(O|\lambda) P(Oλ)

根据算法10.3来实现后向算法backward

class HMM:
  def __init__(self, v, init_prob=None, trans_mat=None, emission_prob=None, verbose=False):
    """
    :v: (list) 所有可能的观测 all possible observation symbols 
    :init_prob: 初始概率 π initial distribution   shape: (N,1)
    :trans_mat: 状态转移概率矩阵 A  transition probs shape: (N,N)
    :emission_prob: 观测概率矩阵 B emission probs   shape: (N,M)
    """
    self.init_prob = init_prob
    self.trans_mat = trans_mat
    self.emission_prob = emission_prob
    self.verbose = verbose

    self.N = None # 可能的状态个数 number of states of the model
    self.M = len(v) # 可能的观测个数 number of observations of the model
    if self.emission_prob is not None:
      self.N = self.emission_prob.shape[0]
    # construct v
    self.v = {symbol:i for i,symbol in enumerate(v)} 
    
  def backward(self, observations):
    """
    :observations: (list) 观测 
    """
    assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
    seq_len = len(observations)
    beta = np.ones((self.N,1))
    # t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
    # Assuming t starts from zero
    for t in range(seq_len-1, 0, -1): # [seq_len-1,1]
      # (N,N)×[(N,1)*(N,1)] = (N,1)
      # * 在同样的维度下是对应元素相乘
      # [index]的形式是为了保持维度(keepdim)
      beta = self.trans_mat @ (self.emission_prob[:,[self.v[observations[t]]]]* beta) 
      if self.verbose:
        print(f"beta_{t}:{beta}\n")

    return np.sum(self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] * beta)



    

这里用numpy矩阵运算加快计算速度,节省代码行数。

Q = np.array([1, 2, 3])
V = ['红', '白']
A = np.array([[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]])
B = np.array( [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]])
O = ['红', '白', '红', '白'] 
PI = np.array([0.2, 0.4, 0.4]).reshape(-1,1)

hmm = HMM(v=V, init_prob=PI, trans_mat=A, emission_prob=B, verbose=True)
hmm.backward(O)  

输出:

beta_3:[[0.46]
 [0.51]
 [0.43]]

beta_2:[[0.2461]
 [0.2312]
 [0.2577]]

beta_1:[[0.112462]
 [0.121737]
 [0.104881]]

0.0600908

习题10.2

  给定盒子和球组成的隐马尔可夫模型 λ = ( A , B , π ) \lambda=(A,B,\pi) λ=(A,B,π),其中,

A = [ 0.5 0.1 0.4 0.3 0.5 0.2 0.2 0.2 0.6 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = ( 0.2 , 0.3 , 0.5 ) T A=\left[ \begin{array}{ccc} 0.5&0.1&0.4 \\ 0.3&0.5&0.2 \\ 0.2&0.2&0.6 \end{array}\right], \quad B=\left[ \begin{array}{cc} 0.5&0.5 \\ 0.4&0.6 \\ 0.7&0.3 \end{array}\right], \quad \pi=(0.2,0.3,0.5)^T A= 0.50.30.20.10.50.20.40.20.6 ,B= 0.50.40.70.50.60.3 ,π=(0.2,0.3,0.5)T

T = 8 T=8 T=8 O = ( 红 , 白 , 红 , 红 , 白 , 红 , 白 , 白 ) O=(红,白,红,红,白,红,白,白) O=(,,,,,,,),试用前向后向概率计算 P ( i 4 = q 3 ∣ O , λ ) P(i_4=q_3|O,\lambda) P(i4=q3O,λ)

类似上一题,先根据算法10.2来实现前向算法forward。这里为了计算任意时刻的状态,需要保存前后向算分的中间结果,因此同时修改上面实现的后向算法。最后利用公式 ( 10.24 ) (10.24) (10.24)计算该题的答案。

import numpy as np

class HMM:
  def __init__(self, v, init_prob=None, trans_mat=None, emission_prob=None, verbose=False):
    """
    :v: (list) 所有可能的观测 all possible observation symbols 
    :init_prob: 初始概率 π initial distribution   shape: (N,1)
    :trans_mat: 状态转移概率矩阵 A  transition probs shape: (N,N)
    :emission_prob: 观测概率矩阵 B emission probs   shape: (N,M)
    """
    self.init_prob = init_prob
    self.trans_mat = trans_mat
    self.emission_prob = emission_prob
    self.verbose = verbose

    self.N = None # 可能的状态个数 number of states of the model
    self.M = len(v) # 可能的观测个数 number of observations of the model
    if self.emission_prob is not None:
      self.N = self.emission_prob.shape[0]
    # construct v
    self.v = {symbol:i for i,symbol in enumerate(v)} 
    
  def backward(self, observations):
    """
    :observations: (list) 观测 
    """
    assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
    seq_len = len(observations)
    betas = [np.ones((self.N,1))]

    # t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
    # Assuming t starts from zero
    for t in range(seq_len-1, 0, -1): # [T-1,1]
      # (N,N)×[(N,1)*(N,1)] = (N,1)
      # * 在同样的维度下是对应元素相乘
      # [index]的形式是为了保持维度(keepdim)
      beta = self.trans_mat @ (self.emission_prob[:,[self.v[observations[t]]]]* betas[-1]) 
      betas.append(beta)
      if self.verbose:
        print(f"beta_{t}:{beta}\n")
    # 变成正序
    betas.reverse()
    self.betas = betas
    return np.sum(self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] * beta)

  def forward(self, observations):
    """
    :observations: (list) 观测 
    """
    assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
    seq_len = len(observations)
    alphas = [ self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] ]
    
    # t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
    # Assuming t starts from zero
    for t in range(seq_len-1): # [0, T-2]
      # self.trans_mat * alphas[-1] (N,N)*(N,1) 右边(N,1)先广播成(N,N),再与左边的(N,N)按元素相乘
      # self.emission_prob[:,self.v[observations[t+1]]] self.v前面没加[],得到的shape为(N,)
      # 左边(N,N) * (N,) 得到(N,N) 为左边第0列所有元素乘以右边第0个元素;第1列元素都乘以右边第1个元素
      # (N,N)按axis=0求和 变成 (N,) reshape成(N,1)确保alpha的维度不变
      alpha = (self.trans_mat * alphas[-1] * self.emission_prob[:,self.v[observations[t+1]]]).sum(0).reshape(-1,1)
      alphas.append(alpha)
      if self.verbose:
        print(f"alpha_{t+1}:{alpha}\n")

    self.alphas = alphas
    return np.sum(alphas)

  def calc_t_i_prob(self, t, i, observations):
    """
    利用前向后向概率,计算在时刻t处于状态qi的概率
    """
    self.forward(observations)
    ob_prob = self.backward(observations)
    return (self.alphas[t-1][i-1] * self.betas[t-1][i-1]) / ob_prob
  
Q = np.array([1, 2, 3])
V = ['红', '白']
A = np.array([[0.5, 0.1, 0.4], [0.3, 0.5, 0.2], [0.2, 0.2, 0.6]])
B = np.array([[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]])
O = ['红', '白', '红', '红', '白', '红', '白', '白']
PI = np.array([0.2, 0.3, 0.5]).reshape(-1,1)

hmm = HMM(v=V, init_prob=PI, trans_mat=A, emission_prob=B, verbose=True)
print(hmm.calc_t_i_prob(4, 3, O) )

输出:

alpha_1:[[0.078 ]
 [0.084 ]
 [0.0822]]

alpha_2:[[0.04032 ]
 [0.026496]
 [0.068124]]

alpha_3:[[0.0208668 ]
 [0.01236192]
 [0.04361112]]

alpha_4:[[0.0114321 ]
 [0.01019392]
 [0.01109573]]

alpha_5:[[0.00549669]
 [0.00338373]
 [0.00928834]]

alpha_6:[[0.00281056]
 [0.00245952]
 [0.00253453]]

alpha_7:[[0.00132502]
 [0.00121063]
 [0.00094105]]

beta_7:[[0.43]
 [0.51]
 [0.4 ]]

beta_6:[[0.1861]
 [0.2415]
 [0.1762]]

beta_5:[[0.105521]
 [0.100883]
 [0.111934]]

beta_4:[[0.04586531]
 [0.05280909]
 [0.04280618]]

beta_3:[[0.02556442]
 [0.02343448]
 [0.02678985]]

beta_2:[[0.01482964]
 [0.01227214]
 [0.01568294]]

beta_1:[[0.00632569]
 [0.00684706]
 [0.00577855]]

[0.53695182]

习题10.3

  在习题10.1中,试用维特比算法求最优路径 I ∗ = ( i 1 ∗ , i 2 ∗ , i 3 ∗ , i 4 ∗ ) I^*=(i_1^*,i_2^*,i_3^*,i_4^*) I=(i1,i2,i3,i4)

维特比算法的实现基于书中的公式即可,最后得到的完整代码为:

class HMM:
  def __init__(self, v, init_prob=None, trans_mat=None, emission_prob=None, verbose=False):
    """
    :v: (list) 所有可能的观测 all possible observation symbols 
    :init_prob: 初始概率 π initial distribution   shape: (N,1)
    :trans_mat: 状态转移概率矩阵 A  transition probs shape: (N,N)
    :emission_prob: 观测概率矩阵 B emission probs   shape: (N,M)
    """
    self.init_prob = init_prob
    self.trans_mat = trans_mat
    self.emission_prob = emission_prob
    self.verbose = verbose

    self.N = None # 可能的状态个数 number of states of the model
    self.M = len(v) # 可能的观测个数 number of observations of the model
    if self.emission_prob is not None:
      self.N = self.emission_prob.shape[0]
    # construct v
    self.v = {symbol:i for i,symbol in enumerate(v)} 
    
  def backward(self, observations):
    """
    :observations: (list) 观测 
    """
    assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
    seq_len = len(observations)
    betas = [np.ones((self.N,1))]

    # t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
    # Assuming t starts from zero
    for t in range(seq_len-1, 0, -1): # [T-1,1]
      # (N,N)×[(N,1)*(N,1)] = (N,1)
      # * 在同样的维度下是对应元素相乘
      # [index]的形式是为了保持维度(keepdim)
      beta = self.trans_mat @ (self.emission_prob[:,[self.v[observations[t]]]]* betas[-1]) 
      betas.append(beta)
      if self.verbose:
        print(f"beta_{t}:{beta}\n")
    # 变成正序
    betas.reverse()
    self.betas = betas
    return np.sum(self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] * beta)

  def forward(self, observations):
    """
    :observations: (list) 观测 
    """
    assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
    seq_len = len(observations)
    alphas = [ self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] ]
    
    # t从1开始,但索引是从0开始,所以,这里假设t也是从0开始;即T-1代表最后一个索引
    # Assuming t starts from zero
    for t in range(seq_len-1): # [0, T-2]
      # self.trans_mat * alphas[-1] (N,N)*(N,1) 右边(N,1)先广播成(N,N),再与左边的(N,N)按元素相乘
      # self.emission_prob[:,self.v[observations[t+1]]] self.v前面没加[],得到的shape为(N,)
      # 左边(N,N) * (N,) 得到(N,N) 为左边第0列所有元素乘以右边第0个元素;第1列元素都乘以右边第1个元素
      # (N,N)按axis=0求和 变成 (N,) reshape成(N,1)确保alpha的维度不变
      alpha = (self.trans_mat * alphas[-1] * self.emission_prob[:,self.v[observations[t+1]]]).sum(0).reshape(-1,1)
      alphas.append(alpha)
      if self.verbose:
        print(f"alpha_{t+1}:{alpha}\n")

    self.alphas = alphas
    return np.sum(alphas)

  def calc_t_i_prob(self, t, i, observations):
    """
    利用前向后向概率,计算在时刻t处于状态qi的概率
    """
    self.forward(observations)
    ob_prob = self.backward(observations)
    return (self.alphas[t-1][i] * self.betas[t-1][i]) / ob_prob

    
  def viterbi(self, observations):
    """
    :observations: (list) 观测 
    """
    assert all(e is not None for e in [self.emission_prob, self.init_prob, self.trans_mat]), "You must learn or specify parameters first."
    seq_len = len(observations)

    deltas = [ self.init_prob * self.emission_prob[:,[self.v[observations[0]]]] ] # δ (N,1)
    phis = np.zeros((seq_len, self.N), dtype=int) # Ψ
    
    for t in range(1, seq_len):
      # self.trans_mat * alphas[-1] (N,N)*(N,1) 右边(N,1)先广播成(N,N),再与左边的(N,N)按元素相乘
      trans_with_weight = self.trans_mat * deltas[-1]

      phis[t] = trans_with_weight.argmax(0)
      # 确保delta的维度一致
      # trans_with_weight.max(0) (N,)
      # self.emission_prob[:,self.v[observations[t]]] (N,)
      # 按元素相乘后再转成想要的维度
      deltas.append( (trans_with_weight.max(0) * self.emission_prob[:,self.v[observations[t]]]).reshape(-1,1) )
      if self.verbose:
        print(f"delta_{t+1}:{deltas[-1].squeeze()}, phis_{t+1}:{phis[t]}")

    max_prob = np.max(deltas[-1])
    i_T = np.argmax(deltas[-1])

    its = [i_T]
    for t in range(seq_len-1, 0, -1):
      its.append(phis[t, its[-1]])
    its.reverse()
    return max_prob, its



    

Q = np.array([1, 2, 3])
V = ['红', '白']
A = np.array([[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]])
B = np.array( [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]])
O = ['红', '白', '红', '白'] 
PI = np.array([0.2, 0.4, 0.4]).reshape(-1,1)

hmm = HMM(v=V, init_prob=PI, trans_mat=A, emission_prob=B, verbose=True)
max_prob, its = hmm.viterbi(O)  
print(max_prob) # 最大概率
print(its) # 索引从0开始,实际上是状态3,2,2,2

输出:

delta_2:[0.028  0.0504 0.042 ], phis_2:[2 2 2]
delta_3:[0.00756 0.01008 0.0147 ], phis_3:[1 1 2]
delta_4:[0.00189  0.003024 0.002205], phis_4:[0 1 2]
0.0030239999999999993
[2, 1, 1, 1]

习题10.4

  试用前向概率和后向概率推导 P ( O ∣ λ ) = ∑ i = 1 N ∑ j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) , t = 1 , 2 , ⋯   , T − 1 P(O|\lambda)=\sum_{i=1}^N\sum_{j=1}^N\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j),\quad t=1,2,\cdots,T-1 P(Oλ)=i=1Nj=1Nαt(i)aijbj(ot+1)βt+1(j),t=1,2,,T1

实际在上文公式 ( 10.22 ) (10.22) (10.22)后面就有推导。

习题10.5

  比较维特比算法中变量 δ \delta δ的计算和前向算法中变量 α \alpha α的计算的主要区别。

维特比算法:

(1)初始化

δ 1 ( i ) = π i b i ( o 1 ) , i = 1 , 2 , ⋯   , N \delta_1(i)=\pi_ib_i(o_1),i=1,2,\cdots,N δ1(i)=πibi(o1),i=1,2,,N
(2)递推,对 t = 2 , 3 , ⋯   , T t=2,3,\cdots,T t=2,3,,T

δ t ( i ) = max ⁡ 1 ⩽ j ⩽ N [ δ t − 1 ( j ) a j i ] b i ( o t ) , i = 1 , 2 , ⋯   , N \delta_t(i)=\max_{1 \leqslant j \leqslant N} [\delta_{t-1}(j)a_{ji}]b_i(o_t), i=1,2,\cdots,N δt(i)=1jNmax[δt1(j)aji]bi(ot),i=1,2,,N

前向算法

(1)初值
α 1 ( i ) = π i b i ( o i ) , i = 1 , 2 , ⋯   , N \alpha_1(i)=\pi_ib_i(o_i),i=1,2,\cdots,N α1(i)=πibi(oi),i=1,2,,N

(2)递推,对 t = 1 , 2 , ⋯   , T − 1 t=1,2,\cdots,T-1 t=1,2,,T1

α t + 1 ( i ) = [ ∑ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) , i = 1 , 2 , ⋯   , N \alpha_{t+1}(i)=\left[\sum_{j=1}^N \alpha_t(j) a_{ji} \right]b_i(o_{t+1}),i=1,2,\cdots,N αt+1(i)=[j=1Nαt(j)aji]bi(ot+1)i=1,2,,N

主要区别:

通过比较两个变量的计算公式,可以看到区别有:

  1. 前向算法计算到时刻 t t t的部分观测序列,且为某一状态的概率;
  2. 维特比算法计算长度为 t t t的观测序列中,得到当前观测序列的最大可能概率;
  3. 前向算法包含到 t − 1 t-1 t1时刻所有可能的观测序列概率和;
  4. 维特比算法包含出现观测序列中所有 t t t个时刻相应观测的最大概率;
Logo

欢迎加入 MCP 技术社区!与志同道合者携手前行,一同解锁 MCP 技术的无限可能!

更多推荐