文本分类 - 综述

任务

  • 短文本分类
  • 长文本分类
  • 超短文本(一个word)分类

特定领域的文本分类

  • aspect-level classification
  • ss

methods:

  • word-level

    • tfidf + svm/lr
    • fastText facebook (只是作为baseline而已)
    • lstm bilstm
    • lstm + attention
    • cnn code1 code2
    • gated cnn
    • rcnn
  • char-level

    • char的作用? 见NLP.md
    • char cnn (Zhang and LeCun, 2015)
    • char rnn
    • char-CRNN (Xiao and Cho, 2016)
    • char-rnn + word rnn (Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation)
    • char-cnn + word rnn
  • Hierarchical:

datasets &

paper & implementation

[TextCNN]: Convolutional Neural Networks for Sentence Classification
[TextCNN-code]: https://richliao.github.io/supervised/classification/2016/11/26/textclassifier-convolutional/

[TextRNN]: Recurrent Neural Network for Text Classification with Multi-Task Learning
[TextRNN-code]:

[RNN+Attention]: Hierarchical Attention Networks for Document Classification
http://www.jianshu.com/p/4fbc4939509f
[RNN+Attention-code]: https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-RNN/

[RNN+CNN]: Recurrent Convolutional Neural Networks for Text Classification. AAAI. 2015.
[RNN+CNN-code]: https://github.com/knok/rcnn-text-classification

[fastText]:

tutorial & survey & blog

http://www.jeyzhang.com/cnn-apply-on-modelling-sentence.html
https://zhuanlan.zhihu.com/p/25928551

web service

1. watson NLC: https://www.ibm.com/watson/developercloud/natural-language-classifier/api/v1
2. songfang NLC

code