Natural Language Processing Notes

Models and Dataset References

Posted on May 09, 2022 · 7 mins read

I just completed the DeepLearning.ai’s NLP specialization on Coursera, went through the Stanford CS224n course in NLP and read a bunch of journal articles. Sorting through the alphabet soup was an undertaking in itself. Since I kept referring to my notes to compare features between the different language models and looking up benchmark datasets & sources, I figured I’d pop the charts in here in case they’re helpful to others.

Language Models

Transformer specs indicate maximum values for: L = # layers/blocks, A = # attention heads, D = # dimensions, P = # params.

year model description specs
2013 word2vec Word Representations in Vectors  
2014 GloVe Global Vectors  
2018 GPT Generative Pre-trained Transformer L=12, A=12, H=768
2019 GPT-2 Unsupervised Multitask Learning L=48, A=48, H=1600
2020 GPT-3 Few-Shot Learners L=96, A=96, H=12288
2018 BERT Bidirectional Encoder Representation for Transformers L=24, A=16, H=1024
2019 RoBERTa Robustly Optimized BERT L=24, A=16, H=1024
2019 T5 Transfer Learning with Text-to-Text Transformer L=24, A=128, H=768

Datasets

dataset full name notes
GLUE General Language Understanding Evaluation  
CoLA Corpus of Linguistic Acceptability Sentence grammatically correct?
SST Stanford Sentiment Treebank Movie review sentiment analysis
MRPC Microsoft Research Paraphrase Corpus Sentences semantically equivalent?
QQP Quora Question Pairs Sentences semantically equivalent?
STS-b Semantic Textual Similarity Benchmark Sentences semantically equivalent?
QNLI Question-answering NLI Sentence contains answer to question?
RTE Recognizing Textual Entailment  
WNLI Winograd NLI  
MNLI Multi-Genre NLI Corpus  
SuperGLUE Stickier Benchmark for NLU  
BoolQ Boolean Questions  
CB CommitmentBank  
CoPA Choice of Plausible Alternatives  
WSD Word Sense Disambiguation  
WiC Word in Context  
MultiRC Multi-Sentence Reading Comprehension  
ReCoRD Reading Comprehension with Commonsense Reasoning Dataset  
FraCaS Framework for Computational Semantics  
SQuAD Stanford Question Answering Dataset  
RACE ReAding Comprehension from Examinations  
LAMBADA LAnguage Modeling Broadened to Account for Discourse Aspects  
CBT Children’s Book Test  
CoQA Conversation Question Answering  
SWAG Situations with Adversarial Generation  
C4 Colossal Clean Crawled Corpus  
  BookCorpus  
  WikiText  
  WebText  
PTB Penn Treebank Parts-of-speech tags
WSC Winograd Schema Challenge  
  WinoGrande Crowd-sourced WSC

Acronyms

term expanded notes
NLP Natural Language Processing  
NLU Natural Language Understanding  
NLI Natural Language Inference  
DPR Definite Pronoun Resolution  
AFS Argument Facet Similarity  
BiDAF Bidirection Attention Flow  
CoVe Contextualized Word Vectors  
HSIC Hilbert-Schmidt Independence Criterion  
PMI Pointwise Mutual Information  
UDA Unsupervised Data Augmentation  
RL2 Reinforcement Learning Fast & Slow  
MAML Model-Agnostic Meta-Learning  
WMT Workshop in Machine Translation  
SemEval Workshop on Semantic Evaluatio  
BPE Byte-Pair Encoding Used to tokenize & build vocabulary lists
TF-IDF Term Frequency–Inverse Document Frequency Reflects word importance for document in corpus
Cover Photo by Jess Bailey on Unsplash