NLP Resource
NLP Resource
Paper
Chinese NLP
A good summary of SOTA Chinese tokenization methods:
Improving Chinese Word Segmentation with Wordhood Memory Networks
Language Model
How to make monolingual model multilingual model:
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Character-level tokenization - An innovative breakthrough on Language Model proposed by Google:
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
Resources on Github
Tracking Progress in different fields of Natural Language Processing
Summary of applications of BERT model
HuggingFace