Syllabus

DateTopicRecommended ReadingsDeadlines
Week 1 — Mon Aug 26Lecture — Introduction, Language modeling [slides]
Week 1 — Wed Aug 28Lecture — Transformers Recap [slides]Transformer_Vaswani++2017

Annotated Transformer

Understanding LSTMs (optional)

BERT_Devlin++2018
Paper list out, Friday Aug 30th.
Week 2 — Mon Sep 2Labor Day — No Class
Week 2 — Wed Sep 4Lecture - Transformers Recap 2 [slides]RoBERTa, ALBERT, ELECTRA

Scaling Laws for Neural Language Models
Paper Selection Due
Week 3 — Mon Sep 9Lecture - GPT3++ [slides]BPE, Language Model Tokenizers Introduce Unfairness Between Languages
GPT-2:
Language Models are Unsupervised Multitask Learners
GPT-3:
Language Models are Few-Shot Learners

Scaling Laws for Neural Language Models
Project Guidelines out Sep 10
Week 3 — Wed Sep 11Lecture - Prompting, CoT [slides]Demystifying Prompts in Language Models via Perplexity Estimation

Calibrate Before Use: Improving Few-Shot Performance of Language Models

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Chain-of-Thought Prompting Elicits Reasoning
in Large Language Models


Large Language Models are Zero-Shot Reasoners
Week 4 — Mon Sep 16Lecture - Scaling, Instruction Tuning [slides]Scaling Laws for Neural Language Models

Training Compute-Optimal Large Language Models

Multitask Prompted Training Enables Zero-Shot Task Generalization

Scaling Instruction-Finetuned Language Models

Alpaca, Self-Instruct: Aligning Language Models with Self-Generated Instructions
Week 4 — Wed Sep 18Lecture - Instruct Tuning [slides]How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Transformer Math

The Power of Scale for Parameter-Efficient Prompt Tuning

LoRA: Low-Rank Adaptation of Large Language Models
Week 5 — Mon Sep 23Student Presentations —

The False Promise of Imitating Proprietary LLMs [slides]
Alpaca

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Project Proposal Due
Week 5 — Wed Sep 25Student PresentationsLIMA [slides]Dataset: https://huggingface.co/datasets/GAIR/lima

Constitutional AI: Harmlessness from AI Feedback

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Week 6 — Mon Sep 30Student presentations —

ChatBotArena
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Human Feedback is not Gold Standard
Week 6 — Wed Oct 2Length-controlled Alpaca Eval

Review1 discussion — MixEval
Eval leaderboard: https://tatsu-lab.github.io/alpaca_eval/

Review 1 Due: Oct 1st 11.59 p.m.
Week 7 — Mon Oct 7Tanya Travelling — No Lecture
Week 7 — Wed Oct 9Student presentations —

Autobencher
Benchbench, tinyBench

Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?
Week 8 — Mon Oct 14Indigenous Peoples' Day — No Class
Week 8 — Wed Oct 16Student Presentations — Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?How pre-trained models capture factual knowledge?

How do language models acquire factual knowledge during pretraining?
Week 9 — Mon Oct 21Student Presentations —

Large Language Models Struggle to Learn Long-Tail Knowledge
Week 9 — Wed Oct 23Student Presentations —

FactScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
WICE: Real-World Entailment for Claims in Wikipedia

DYNAMICQA: Tracing Internal Knowledge Conflicts in Language Models

Context versus Prior Knowledge in Language Models
Week 10 — Mon Oct 28Lecture — Alignment [slides]Proximal Policy Optimization Algorithms

Learning to summarize from human feedback

The N+ Implementation Details of RLHF with PPO:
A Case Study on TL;DR Summarization


A General Theoretical Paradigm to Understand Learning from Human Preferences
Week 10 — Wed Oct 30Student Presentation: A Long Way to Go: Investigating Length Correlations in RLHFLength Desensitization in Directed Preference Optimization

Disentangling Length from Quality in Direct Preference Optimization

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Week 11 — Mon Nov 4Student presentations

Scaling Laws for Reward Model Overoptimization

Review 2 discussion — Iterative Preference Optimization with the Pairwise Cringe Loss
Review2 Due: Nov 3 11.59pm
Week 11 — Wed Nov 6Student presentations

SimPO: Simple Preference Optimization with a Reference-Free Reward
Week 12 — Mon Nov 11Student PresentationsLoRA: Low-Rank Adaptation of Large Language Models
Week 12 — Wed Nov 13Student presentations — StreamingLLM

Check-in due: Nov 13, 11.59 p.m.
Week 13 — Mon Nov 18Student presentations — Speculative Decoding
Week 13 — Wed Nov 20Student Presentations — Medusa Decoding
Week 14 — Mon Nov 25Student presentations —

Generalization through Memorization: Nearest Neighbor Language Models
kNN-LM Does Not Improve Open-ended Text Generation
Week 14 — Wed Nov 27Thanksgiving Break — No Class
Week 15 — Mon Dec 2Lecture
Week 15 — Wed Dec 4Project PresentationsProject Presentation
Week 16 — Mon Dec 9Project PresentationsProject Report (Due Dec 16)