✦ Book

  • Jurafsky, D., & Martin, J. H. (2024). Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models (3rd ed., draft). Available online

✦ Paper List

# Authors (Year) Title
1 Mikolov et al. (2013) Efficient Estimation of Word Representations in Vector Space
2 Pennington et al. (2014) GloVe: Global Vectors for Word Representation
3 Levy et al. (2015) Improving Distributional Similarity with Lessons Learned from Word Embeddings
4 Collobert et al. (2011) Natural Language Processing (Almost) from Scratch
5 Chen & Manning (2014) A Fast and Accurate Dependency Parser using Neural Networks
6 de Marneffe et al. (2021) Universal Dependencies
7 Sak et al. (2014) Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling
8 Du et al. (2024) Financial Sentiment Analysis: Techniques and Applications
9 Vaswani et al. (2017) Attention Is All You Need
10 Huang et al. (2018) Music Transformer
11 Devlin (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
12 Smith (2020) Contextual Word Representations: A Contextual Introduction
13 Chung et al. (2022) Scaling Instruction-Finetuned Language Models
14 Wang et al. (2022) How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
15 Taguchi & Sproat (2025) IASC: Interactive Agentic System for ConLangs
16 Brown et al. (2020) Language Models are Few-Shot Learners
17 Hu et al. (2021) LoRA: Low-Rank Adaptation of Large Language Models
18 Hendrycks et al. (2021) Measuring massive multitask language understanding
19 Liang et al. (2023) Holistic Evaluation of Language Models
20 Yao et al. (2023) ReAct: Synergizing Reasoning and Acting in Language Models
21 Shick et al. (2023) Toolformer: Language Models Can Teach Themselves to Use Tools
22 Wei et al. (2023) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
23 Wang et al. (2023) Self-consistency improves chain of thought reasoning in language models
24 Lightman et al. (2023) Let’s Verify Step by Step
25 Snell et al. (2024) Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters