AI Papers Reading List • Diego Rodríguez Atencia

🎯 Start Here

Neural Network → LLM Series (3Blue1Brown)

📚 Survey Papers

LLM Survey

Comprehensive overview of large language models

2024

Agent Survey

Survey of LLM-based agents

2023

Prompt Engineering Survey

Comprehensive guide to prompt engineering techniques

2024

🏗️ Foundational Modelling

Attention Is All You Need *

The original transformer architecture paper

2017

Scaling Laws for Neural Language Models

Understanding how model performance scales with size

2020

GPT-3 *

Language models are few-shot learners

2020

LoRA: Low-Rank Adaptation of Large Language Models

Efficient fine-tuning technique using low-rank decomposition

2021

Training Compute-Optimal Large Language Models

Chinchilla paper on optimal model and data scaling

2022

Training Language Models to Follow Instructions with Human Feedback *

InstructGPT: RLHF for instruction following

2022

Direct Preference Optimization

Simpler alternative to RLHF for alignment

2023

Judging LLM-as-a-Judge

Using LLMs to evaluate other LLMs

2023

Mixtral of Experts

Sparse mixture of experts architecture

2024

🧠 Planning & Reasoning

Mastering Chess and Shogi by Self-Play (AlphaZero)

General game-playing through self-play reinforcement learning

2017

Mastering Atari, Go, Chess and Shogi (MuZero) *

Learning without knowing the rules

2019

Chain-of-Thought Prompting *

Eliciting reasoning in large language models

2022

ReAct: Synergizing Reasoning and Acting

Combining reasoning traces with task-specific actions

2022

Tree of Thoughts

Deliberate problem solving with language models

2023

Graph of Thoughts

Advanced reasoning with graph structures

2023

Let's Verify Step by Step

Outcome vs. process supervision for reasoning

2023

Meta Chain-of-Thought

Learning to improve reasoning chains

2024

ARC Prize *

Progress towards general intelligence through reasoning

2024

DeepSeek-R1 *

Incentivizing reasoning capability in LLMs

2025

🚀 Applications

Toolformer

Language models can teach themselves to use tools

2023

GPT-4 Technical Report

Multimodal large-scale language model

2023

The Llama 3 Herd of Models *

Open source multilingual language models

2024

Gemini 1.5

Long-context understanding and reasoning

2024

DeepSeek-V3

Cost-efficient mixture of experts model

2024

SWE-Agent

Agent-computer interface for automated software engineering

2024

OpenHands

Open platform for software development agents

2024

📊 Benchmarks

Beyond the Imitation Game (BIG-Bench)

Diverse evaluation tasks for language models

2022

SWE-bench

Evaluating language models on real-world software issues

2023

Chatbot Arena

Crowdsourced benchmarking platform for LLMs

2024

🎥 Videos & Lectures

3Blue1Brown - Visual mathematics and deep learning explanations

Build a Large Language Model From Scratch - Comprehensive book guide

Andrej Karpathy: Neural Networks: Zero to Hero - Hands-on tutorial series

Yannic Kilcher - Paper reviews and ML news

Noam Brown on Planning - Expert insights on AI planning

Stanford CS324: Advances in Foundation Models - Building LLMs course

Foundations of LLMs - Comprehensive tutorial paper

Why You're Not Too Old to Get Into AI - Motivational perspective

🌐 Helpful Websites

History of Deep Learning - Timeline and key developments

Full Stack Deep Learning - Production ML resources

Prompting Guide - Comprehensive prompt engineering guide

a16z AI Cannon - Curated list of AI resources

2025 AI Engineer Reading List - Latest papers and trends

State of Generative Models 2024 - Year in review

🎨 Beyond LLMs

Vision Transformer (ViT)

An image is worth 16x16 words - Transformers for image classification

2021

High-Resolution Image Synthesis with Latent Diffusion

Stable Diffusion - Efficient image generation in latent space

2021

🎯 Ilya Sutskever's Top 30 Papers

The First Law of Complexodynamics

Scott Aaronson's insights on complexity theory

2011

The Unreasonable Effectiveness of Recurrent Neural Networks

Andrej Karpathy's exploration of RNN capabilities

2015

Understanding LSTM Networks

Christopher Olah's visual guide to LSTMs

2015

Recurrent Neural Network Regularization

Wojciech Zaremba et al. on improving RNN training

2014

Keeping Neural Networks Simple by Minimizing the Description Length of the Weights

Geoffrey Hinton & Drew van Camp on MDL principle

1993

Pointer Networks

Oriol Vinyals et al. on attention-based architectures

2015

ImageNet Classification with Deep Convolutional Neural Networks *

AlexNet: The breakthrough in deep learning (Alex Krizhevsky et al.)

2012

Order Matters: Sequence to Sequence for Sets

Oriol Vinyals et al. on set-to-sequence problems

2015

GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism

Yanping Huang et al. on efficient training of large models

2018

Deep Residual Learning for Image Recognition *

ResNet: Revolutionary skip connections (Kaiming He et al.)

2015

Multi-Scale Context Aggregation by Dilated Convolutions

Fisher Yu & Vladlen Koltun on dilated convolutions

2015

Neural Message Passing for Quantum Chemistry

Justin Gilmer et al. on graph neural networks

2017

Attention Is All You Need *

The foundational transformer architecture

2017

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau et al. introducing attention mechanism

2014

Identity Mappings in Deep Residual Networks

Kaiming He et al. improving ResNet design

2016

A Simple Neural Network Module for Relational Reasoning

Adam Santoro et al. on relation networks

2017

Variational Lossy Autoencoder

Xi Chen et al. on VAEs with powerful decoders

2016

Relational Recurrent Neural Networks

Adam Santoro et al. on memory and relational reasoning

2018

Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

Scott Aaronson et al. on entropy and complexity

2014

Neural Turing Machines *

Alex Graves et al. on differentiable memory

2014

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Dario Amodei et al. on speech recognition

2015

Scaling Laws for Neural Language Models *

Jared Kaplan et al. on model scaling

2020

A Tutorial Introduction to the Minimum Description Length Principle

Peter Grünwald on MDL theory

2004

Machine Super Intelligence

Shane Legg's thesis on AGI

2008

Kolmogorov Complexity and Algorithmic Randomness

Shen, Uspensky, Vereshchagin on algorithmic information theory

2017

CS231n: Convolutional Neural Networks for Visual Recognition

Stanford's famous computer vision course

Ongoing

Better & Faster Large Language Models via Multi-token Prediction

Meta research on improving LLM training

2024

Dense Passage Retrieval for Open-Domain Question Answering

RAG approach for knowledge-intensive tasks

2020

Precise Zero-Shot Dense Retrieval Without Relevance Labels

HyDE: Hypothetical Document Embeddings

2022

ALCUNA: Large Language Models Meet New Knowledge

Evaluating LLMs on knowledge updates

2023

🌟 Easy Papers for Beginners

Chain-of-Thought Prompting

Great introduction to reasoning in LLMs

2022

SELF-REFINE

Iterative refinement with self-feedback

2023