Natural Language Processing Notes

I just completed the DeepLearning.ai’s NLP specialization on Coursera, went through the Stanford CS224n course in NLP and read a bunch of journal articles. Sorting through the alphabet soup was an undertaking in itself. Since I kept referring to my notes to compare features between the different language models and looking up benchmark datasets & sources, I figured I’d pop the charts in here in case they’re helpful to others.

Language Models

Transformer specs indicate maximum values for: L = # layers/blocks, A = # attention heads, D = # dimensions, P = # params.

year	model	description	specs
2013	word2vec	Word Representations in Vectors
2014	GloVe	Global Vectors
2018	GPT	Generative Pre-trained Transformer	L=12, A=12, H=768
2019	GPT-2	Unsupervised Multitask Learning	L=48, A=48, H=1600
2020	GPT-3	Few-Shot Learners	L=96, A=96, H=12288
2018	BERT	Bidirectional Encoder Representation for Transformers	L=24, A=16, H=1024
2019	RoBERTa	Robustly Optimized BERT	L=24, A=16, H=1024
2019	T5	Transfer Learning with Text-to-Text Transformer	L=24, A=128, H=768

Datasets

dataset	full name	notes
GLUE	General Language Understanding Evaluation
CoLA	Corpus of Linguistic Acceptability	Sentence grammatically correct?
SST	Stanford Sentiment Treebank	Movie review sentiment analysis
MRPC	Microsoft Research Paraphrase Corpus	Sentences semantically equivalent?
QQP	Quora Question Pairs	Sentences semantically equivalent?
STS-b	Semantic Textual Similarity Benchmark	Sentences semantically equivalent?
QNLI	Question-answering NLI	Sentence contains answer to question?
RTE	Recognizing Textual Entailment
WNLI	Winograd NLI
MNLI	Multi-Genre NLI Corpus
SuperGLUE	Stickier Benchmark for NLU
BoolQ	Boolean Questions
CB	CommitmentBank
CoPA	Choice of Plausible Alternatives
WSD	Word Sense Disambiguation
WiC	Word in Context
MultiRC	Multi-Sentence Reading Comprehension
ReCoRD	Reading Comprehension with Commonsense Reasoning Dataset
FraCaS	Framework for Computational Semantics
SQuAD	Stanford Question Answering Dataset
RACE	ReAding Comprehension from Examinations
LAMBADA	LAnguage Modeling Broadened to Account for Discourse Aspects
CBT	Children’s Book Test
CoQA	Conversation Question Answering
SWAG	Situations with Adversarial Generation
C4	Colossal Clean Crawled Corpus
	BookCorpus
	WikiText
	WebText
PTB	Penn Treebank	Parts-of-speech tags
WSC	Winograd Schema Challenge
	WinoGrande	Crowd-sourced WSC

Acronyms

term	expanded	notes
NLP	Natural Language Processing
NLU	Natural Language Understanding
NLI	Natural Language Inference
DPR	Definite Pronoun Resolution
AFS	Argument Facet Similarity
BiDAF	Bidirection Attention Flow
CoVe	Contextualized Word Vectors
HSIC	Hilbert-Schmidt Independence Criterion
PMI	Pointwise Mutual Information
UDA	Unsupervised Data Augmentation
RL2	Reinforcement Learning Fast & Slow
MAML	Model-Agnostic Meta-Learning
WMT	Workshop in Machine Translation
SemEval	Workshop on Semantic Evaluatio
BPE	Byte-Pair Encoding	Used to tokenize & build vocabulary lists
TF-IDF	Term Frequency–Inverse Document Frequency	Reflects word importance for document in corpus

Cover Photo by Jess Bailey on Unsplash

← Previous Post Next Post →