【语言模型】- 综述

基础

RNN CNN的发展

lstm
cudnn-lstm
sru

softmax的发展

语言模型发展(增量介绍)

  1. Class-Based n-gram Models of Natural Language(1992), Peter F. Brown et al. [pdf]
  2. An estimate for an upper bound for the entropy of English (1992), [pdf] 神经概率语言模型的第一篇论文,使用了WordNet作为先验知识;
  3. An empirical study of smoothing techniques for language modeling(1996), Stanley F. Chen et al. [pdf]
  4. A Neural Probabilistic Language Model(2000), Yoshua Bengio et al. [pdf]
  5. A new statistical approach to Chinese Pinyin input(2000), Zheng Chen et al. [pdf]
  6. A Neural Probabilistic Language Model(2003), Yoshua Bengio et al. [pdf]
  7. Hierarchical probabilistic neural network language model(2005) [pdf] 在【2】的基础上将词汇分层得到的,优化了算法的时间复杂度;
  8. Discriminative n-gram language modeling(2007), Brian Roark et al. [pdf]
  9. Three new graphical models for statistical language modelling (2007) Hinton et al. [pdf]
  10. A scalable hierarchical distributed language model(2009) Hinton et al. [pdf] 在【9】的基础上将词汇分层得到
  11. Neural Network Language Model for Chinese Pinyin Input Method Engine(2015), S Chen et al. [pdf]
  12. Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition(2016), Xie Chen et al. [pdf]
  13. Exploring the limits of language modeling(2016), R Jozefowicz et al. [pdf]
  14. On the State of the Art of Evaluation in Neural Language Models(2016), G Melis et al. [pdf]

ngram

分类汇总

按照粒度

  • char-based model
  • subword-based model - no OOV rate, smaller model size and better speed.
    • Subword Language Modeling with Neural Networks.
  • word-based model
  • phrase-based model

加速 - softmax

对损失函数的近似方法

加速 - rnn cell

  • vanila rnn
  • lstm
  • gru
  • qusi-rnn
  • sru

nnlm

经典/传统 rnnlm

正则化约束

subword - lm

char - lm

-

  • faster-rnnlm

lm应用

模型融合

rescore策略

其他trick

  • oov的penalty
    -

参考