Syllabus

Date	Topic	Recommended Readings	Deadlines
Week 1 — Mon Aug 25	Introduction
Week 1 — Wed Aug 27	Lecture - Transformers
Week 2 — Mon Sep 1	Labor Day - No Class
Week 2 — Wed Sep 3	Lecture - GPT3++		Sign up sheet released
Week 3 — Mon Sep 8	Lecture - GPT3++		Submit Sign up for presentations, 11.59 pm
Week 3 — Wed Sep 10	Lecture - Post training 2
Week 4 — Mon Sep 15	Lecture - Post training 2
Week 4 — Wed Sep 17	Student Presentations: Evaluation News Summarization and Evaluation in the Era of GPT-3
Week 5 — Mon Sep 22	Student Presentations: Evaluation Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena	Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators LLM Evaluators Recognize and Favor Their Own Generations MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Week 5 — Wed Sep 24	Student Presentations: Evaluation This blog + “My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
Week 6 — Mon Sep 29	Student Presentations: Long Context Method: Longformer: The Long-Document Transformer		Project Proposal Due: Sep 29, 11.59pm
Week 6 — Wed Oct 1	Student Presentations: Long Context Evaluation: RULER: What's the Real Context Size of Your Long-Context Language Models?	HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Week 7 — Mon Oct 6	Tanya Traveling - No Class
Week 7 — Wed Oct 8	Tanya Traveling - No Class
Week 8 — Mon Oct 13	Indigenous Peoples' Day - No Class
Week 8 — Wed Oct 15	Student Presentations: Long Context Method Method: Efficient streaming language models with attention sinks	H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Week 9 — Mon Oct 20	Student Presentations: Long Context Method Method: SnapKV: LLM Knows What You are Looking for Before Generation	RefreshKV: Updating Small KV Cache During Long-form Generation
Week 9 — Wed Oct 22	Student Presentations: Data: Data Engineering for Scaling Language Models to 128K Context	QuRating: Selecting High-Quality Data for Training Language Models How to Train Long-Context Language Models (Effectively) Datalogy blog Synthetic bootstrapped pretraining
Week 10 — Mon Oct 27	Student Presentations: Long Context Evaluation: One Thousand and One Pairs: A "novel" challenge for long-context language models	FABLES: Evaluating faithfulness and content selection in book-length summarization HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Week 10 — Wed Oct 29	Student Presentations: Reasoning Method: STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning	Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Chain-of-Thought Reasoning Without Prompting Large Language Models Can Self-Improve	Area review (Long context) Due: Oct 31, 11.59pm
Week 11 — Mon Nov 3	Student Presentations: Reasoning Method: Let's Verify Step by Step
Week 11 — Wed Nov 5	Student Presentations: Reasoning Method: Iterative Reasoning Preference Optimization
Week 12 — Mon Nov 10	Student Presentations: Reasoning DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	DeepSeek-V3 Technical Report s1: Simple test-time scaling Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Week 12 — Wed Nov 12	Student Presentations: Reasoning Analysis: Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Week 13 — Mon Nov 17	Student Presentations: Factuality Evaluation: FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation	Benchmarks: FELM: Benchmarking Factuality Evaluation of Large Language Models Measuring short-form factuality in large language models	Area review Due: Nov 17, 11.59pm
Week 13 — Wed Nov 19	Student Presentations: Factuality Analysis: Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?	Methods: Alignment for Honesty Pay-Per-Search Models are Abstention Models	Project Check-in Due Nov 21, 11.59pm
Week 14 — Mon Nov 24	Student Presentations: Factuality Analysis: HALoGEN: Fantastic LLM Hallucinations and Where to Find Them	Analysis: Physics of Language Models: Part 3.1, Knowledge Storage and Extraction	Area review Due: Nov 25, 11.59pm (no penalty submissions accepted throughout Thanksgiving holidays)
Week 14 — Wed Nov 26	Thanksgiving Break — No Class
Week 15 — Mon Dec 1	Course Retrospective - Tanya
Week 15 — Wed Dec 3	Project Presentations
Week 16 — Mon Dec 8	Project Presentations