| Week 1 — Mon Aug 25 | Introduction | | |
| Week 1 — Wed Aug 27 | Lecture - Transformers | | |
| Week 2 — Mon Sep 1 | Labor Day - No Class | | |
| Week 2 — Wed Sep 3 | Lecture - GPT3++ | | Sign up sheet released |
| Week 3 — Mon Sep 8 | Lecture - GPT3++ | | Submit Sign up for presentations, 11.59 pm |
| Week 3 — Wed Sep 10 | Lecture - Post training 2 | | |
| Week 4 — Mon Sep 15 | Lecture - Post training 2 | | |
| Week 4 — Wed Sep 17 | Student Presentations: Evaluation News Summarization and Evaluation in the Era of GPT-3 | | |
| Week 5 — Mon Sep 22 | Student Presentations: Evaluation Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators LLM Evaluators Recognize and Favor Their Own Generations MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures | |
| Week 5 — Wed Sep 24 | Student Presentations: Evaluation This blog + “My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models | | |
| Week 6 — Mon Sep 29 | Student Presentations: Long Context Method: Longformer: The Long-Document Transformer | | Project Proposal Due: Sep 29, 11.59pm |
| Week 6 — Wed Oct 1 | Student Presentations: Long Context Evaluation: RULER: What's the Real Context Size of Your Long-Context Language Models? | HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly L-Eval: Instituting Standardized Evaluation for Long Context Language Models | |
| Week 7 — Mon Oct 6 | Tanya Traveling - No Class | | |
| Week 7 — Wed Oct 8 | Tanya Traveling - No Class | | |
| Week 8 — Mon Oct 13 | Indigenous Peoples' Day - No Class | | |
| Week 8 — Wed Oct 15 | Student Presentations: Long Context Method Method: Efficient streaming language models with attention sinks | H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | |
| Week 9 — Mon Oct 20 | Student Presentations: Long Context Method Method: SnapKV: LLM Knows What You are Looking for Before Generation | RefreshKV: Updating Small KV Cache During Long-form Generation
| |
| Week 9 — Wed Oct 22 | Student Presentations: Data: Data Engineering for Scaling Language Models to 128K Context | QuRating: Selecting High-Quality Data for Training Language Models How to Train Long-Context Language Models (Effectively) Datalogy blog Synthetic bootstrapped pretraining | |
| Week 10 — Mon Oct 27 | Student Presentations: Long Context Evaluation: One Thousand and One Pairs: A "novel" challenge for long-context language models | FABLES: Evaluating faithfulness and content selection in book-length summarization HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems | |
| Week 10 — Wed Oct 29 | Student Presentations: Reasoning Method: STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning | Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Chain-of-Thought Reasoning Without Prompting Large Language Models Can Self-Improve
| Area review (Long context) Due: Oct 31, 11.59pm |
| Week 11 — Mon Nov 3 | Student Presentations: Reasoning Method: Let's Verify Step by Step | | |
| Week 11 — Wed Nov 5 | Student Presentations: Reasoning Method: Iterative Reasoning Preference Optimization | | |
| Week 12 — Mon Nov 10 | Student Presentations: Reasoning DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | DeepSeek-V3 Technical Report s1: Simple test-time scaling Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
| |
| Week 12 — Wed Nov 12 | Student Presentations: Reasoning Analysis: Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs | | |
| Week 13 — Mon Nov 17 | Student Presentations: Factuality Evaluation: FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation | | Area review Due: Nov 17, 11.59pm |
| Week 13 — Wed Nov 19 | Student Presentations: Factuality Analysis: Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | | Project Check-in Due Nov 21, 11.59pm |
| Week 14 — Mon Nov 24 | Student Presentations: Factuality Analysis: HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | | Area review Due: Nov 25, 11.59pm (no penalty submissions accepted throughout Thanksgiving holidays) |
| Week 14 — Wed Nov 26 | Thanksgiving Break — No Class | | |
| Week 15 — Mon Dec 1 | Course Retrospective - Tanya | | |
| Week 15 — Wed Dec 3 | Project Presentations | | |
| Week 16 — Mon Dec 8 | Project Presentations | | |