Week 1 — Mon Aug 26 | Lecture — Introduction, Language modeling [slides] | | |
Week 1 — Wed Aug 28 | Lecture — Transformers Recap [slides] | Transformer_Vaswani++2017
Annotated Transformer
Understanding LSTMs (optional)
BERT_Devlin++2018 | Paper list out, Friday Aug 30th. |
Week 2 — Mon Sep 2 | Labor Day — No Class | | |
Week 2 — Wed Sep 4 | Lecture - Transformers Recap 2 [slides] | RoBERTa, ALBERT, ELECTRA
Scaling Laws for Neural Language Models | Paper Selection Due |
Week 3 — Mon Sep 9 | Lecture - GPT3++ [slides] | BPE, Language Model Tokenizers Introduce Unfairness Between Languages GPT-2: Language Models are Unsupervised Multitask Learners GPT-3: Language Models are Few-Shot Learners
Scaling Laws for Neural Language Models | Project Guidelines out Sep 10 |
Week 3 — Wed Sep 11 | Lecture - Prompting, CoT [slides] | Demystifying Prompts in Language Models via Perplexity Estimation
Calibrate Before Use: Improving Few-Shot Performance of Language Models
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Large Language Models are Zero-Shot Reasoners | |
Week 4 — Mon Sep 16 | Lecture - Scaling, Instruction Tuning [slides] | Scaling Laws for Neural Language Models
Training Compute-Optimal Large Language Models
Multitask Prompted Training Enables Zero-Shot Task Generalization
Scaling Instruction-Finetuned Language Models
Alpaca, Self-Instruct: Aligning Language Models with Self-Generated Instructions | |
Week 4 — Wed Sep 18 | Lecture - Instruct Tuning [slides] | How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Transformer Math
The Power of Scale for Parameter-Efficient Prompt Tuning
LoRA: Low-Rank Adaptation of Large Language Models | |
Week 5 — Mon Sep 23 | Student Presentations —
The False Promise of Imitating Proprietary LLMs [slides] | Alpaca
Self-Instruct: Aligning Language Models with Self-Generated Instructions
| Project Proposal Due |
Week 5 — Wed Sep 25 | Student Presentations — LIMA [slides] | Dataset: https://huggingface.co/datasets/GAIR/lima
Constitutional AI: Harmlessness from AI Feedback
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning | |
Week 6 — Mon Sep 30 | Student presentations —
ChatBotArena | Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Human Feedback is not Gold Standard | |
Week 6 — Wed Oct 2 | Length-controlled Alpaca Eval
Review1 discussion — MixEval | Eval leaderboard: https://tatsu-lab.github.io/alpaca_eval/
| Review 1 Due: Oct 1st 11.59 p.m. |
Week 7 — Mon Oct 7 | Tanya Travelling — No Lecture | | |
Week 7 — Wed Oct 9 | Student presentations —
Autobencher | Benchbench, tinyBench
Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards? | |
Week 8 — Mon Oct 14 | Indigenous Peoples' Day — No Class | | |
Week 8 — Wed Oct 16 | Student Presentations — Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | How pre-trained models capture factual knowledge?
How do language models acquire factual knowledge during pretraining? | |
Week 9 — Mon Oct 21 | Student Presentations —
Large Language Models Struggle to Learn Long-Tail Knowledge | | |
Week 9 — Wed Oct 23 | Student Presentations —
FactScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation | WICE: Real-World Entailment for Claims in Wikipedia
DYNAMICQA: Tracing Internal Knowledge Conflicts in Language Models
Context versus Prior Knowledge in Language Models | |
Week 10 — Mon Oct 28 | Lecture — Alignment [slides] | Proximal Policy Optimization Algorithms
Learning to summarize from human feedback
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
A General Theoretical Paradigm to Understand Learning from Human Preferences | |
Week 10 — Wed Oct 30 | Student Presentation: A Long Way to Go: Investigating Length Correlations in RLHF | Length Desensitization in Directed Preference Optimization
Disentangling Length from Quality in Direct Preference Optimization
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking | |
Week 11 — Mon Nov 4 | Student presentations
Scaling Laws for Reward Model Overoptimization
Review 2 discussion — Iterative Preference Optimization with the Pairwise Cringe Loss | | Review2 Due: Nov 3 11.59pm |
Week 11 — Wed Nov 6 | Student presentations
SimPO: Simple Preference Optimization with a Reference-Free Reward | | |
Week 12 — Mon Nov 11 | Student Presentations — LoRA: Low-Rank Adaptation of Large Language Models | | |
Week 12 — Wed Nov 13 | Student presentations — StreamingLLM |
| Check-in due: Nov 13, 11.59 p.m. |
Week 13 — Mon Nov 18 | Student presentations — Speculative Decoding | | |
Week 13 — Wed Nov 20 | Student Presentations — Medusa Decoding | | |
Week 14 — Mon Nov 25 | Student presentations —
Generalization through Memorization: Nearest Neighbor Language Models | kNN-LM Does Not Improve Open-ended Text Generation | |
Week 14 — Wed Nov 27 | Thanksgiving Break — No Class | | |
Week 15 — Mon Dec 2 | Lecture | | |
Week 15 — Wed Dec 4 | Project Presentations | | Project Presentation |
Week 16 — Mon Dec 9 | Project Presentations | | Project Report (Due Dec 16) |