Ivan Lee

I am a Computer Science PhD student at UC San Diego, advised by Taylor Berg-Kirkpatrick in the BergLab. My research focuses on context compaction in language models, multi-agent research automation, and emergent alignment: alternative post-training methods that elicit cooperative behavior without explicit alignment supervision. Before returning to academia, I worked in advertising and education technology. I earned my MSc from UC San Diego and my BSc from UC Davis.

selected publications

arXiv
The Format Tax

Ivan Lee, Loris D’Antoni, and Taylor Berg-Kirkpatrick

arXiv 2026

Abs Bib PDF Code

Asking a large language model to respond in JSON should be a formatting choice, not a capability tax. Yet we find that structured output requirements—JSON, XML, LaTeX, Markdown—substantially degrade reasoning and writing performance across open-weight models. The research response has focused on constrained decoding, but sampling bias accounts for only a fraction of the degradation. The dominant cost enters at the prompt: format-requesting instructions alone cause most of the accuracy loss, before any decoder constraint is applied. This diagnosis points to a simple principle: decouple reasoning from formatting. Whether by generating freeform first and reformatting in a second pass, or by enabling extended thinking within a single generation, separating the two concerns substantially recovers lost accuracy. Across six open-weight models, four API models, four formats, and tasks spanning math, science, logic, and writing, decoupling recovers most lost accuracy. Notably, most recent closed-weight models show little to no format tax, suggesting the problem is not inherent to structured generation but a gap that current open-weight models have yet to close.
@article{lee2026formattax, title = {The Format Tax}, author = {Lee, Ivan and D'Antoni, Loris and Berg-Kirkpatrick, Taylor}, year = {2026}, month = apr, journal = {arXiv preprint arXiv:2604.03616}, keywords = {preprint}, }
arXiv
Optical Context Compression Is Just (Bad) Autoencoding

Ivan Lee, Cheng Yang, and Taylor Berg-Kirkpatrick

arXiv 2025

Abs Bib PDF Code

DeepSeek-OCR demonstrates that rendered text can be reconstructed with high fidelity from a small number of vision tokens. This finding has sparked excitement about vision-based context compression for language models. But the evaluation stops at reconstruction; whether these representations help language modeling remains untested. We test two assumptions implicit in the optical-compression narrative: that vision-based compression provides unique advantages for text reconstruction from compressed representations, and that DeepSeek-OCR’s reconstruction results are evidence that vision-based compression will be useful for language modeling. Comparing their vision encoder against simple alternatives—parameter-free mean pooling and a learned hierarchical encoder—we find that these simple approaches match or surpass vision for reconstruction at matched compression ratios, and outperform it for language modeling—where vision-based compression fails to beat truncation. The excitement around optical context compression outpaces the evidence.
@article{lee2025optical, title = {Optical Context Compression Is Just (Bad) Autoencoding}, author = {Lee, Ivan and Yang, Cheng and Berg-Kirkpatrick, Taylor}, year = {2025}, month = dec, journal = {arXiv preprint arXiv:2512.03643}, keywords = {preprint}, }
ICLR
Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability

Ivan Lee, Nan Jiang, and Taylor Berg-Kirkpatrick

ICLR 2024

Abs Bib PDF Code

What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps toward answering this question. We evaluate thirteen model architectures capable of causal language modeling across a suite of synthetic in-context learning tasks. These selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, state space model inspired, and other emerging attention alternatives. We discover that all the considered architectures can perform in-context learning under a wider range of conditions than previously documented. Additionally, we observe stark differences in statistical efficiency and consistency by varying the number of in-context examples and task difficulty. We also measure each architecture’s predisposition towards in-context learning when presented with the option to memorize rather than leverage in-context examples. Finally, and somewhat surprisingly, we find that several attention alternatives are sometimes competitive with or better in-context learners than transformers. However, no single architecture demonstrates consistency across all tasks, with performance either plateauing or declining when confronted with a significantly larger number of in-context examples than those encountered during gradient-based training.
@inproceedings{lee2024exploring, title = {Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability}, author = {Lee, Ivan and Jiang, Nan and Berg-Kirkpatrick, Taylor}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2024}, }
COLM
Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models

Oral Spotlight (top 5.7%)

Highlighted by Chris Manning: "Best thing I’ve seen at COLM 2025 so far."

Ivan Lee and Taylor Berg-Kirkpatrick

COLM 2025

Abs Bib PDF Slides

Recent studies suggest that very small language models (SLMs) can generate surprisingly coherent text when trained on simplified, child-directed corpora such as TinyStories. These findings have been interpreted as evidence that readability—characterized by accessible vocabulary, familiar narrative structure, and simple syntax—plays a key role in enabling such capabilities to emerge. In this paper, we challenge that interpretation. We construct synthetic datasets with matched structure but varied readability, and find that readability alone does not predict coherence or learning efficiency in SLMs. Models trained on complex, adult-level text perform comparably to those trained on simplified language, and even exhibit faster development of coherence during training. Instead, we show that statistical simplicity, as measured by n-gram diversity, is a stronger predictor of learnability. Our findings caution against the growing trend of anthropomorphizing language model training—drawing parallels to human cognitive development without empirical basis—and argue for more precise reasoning about what properties actually support capability emergence in small models.
@inproceedings{lee-berg-kirkpatrick-2025-readability, title = {Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models}, author = {Lee, Ivan and Berg-Kirkpatrick, Taylor}, booktitle = {Conference on Language Modeling (COLM)}, month = oct, year = {2025}, publisher = {OpenReview}, annotation = {Highlighted by <a href="https://x.com/chrmanning/status/1975657067303096333">Chris Manning</a>: "Best thing I've seen at COLM 2025 so far."} }

honors & awards

Gold Reviewer Award, ICML 2026. Top 25% of reviewers.
Oral Spotlight, COLM 2025. Top 5.7% of accepted papers.

invited participation

Schmidt Sciences Trustworthy AI Convening, New Orleans. March 2026.

teaching

Teaching Assistant, UC San Diego (instructor: Taylor Berg-Kirkpatrick):

DSC 258R: Natural Language Processing. Spring 2026.
CSE 251A / 151A: Machine Learning. Winter 2026.

service

Reviewing: ACL Rolling Review (ARR) 2026; ACL Student Research Workshop 2025; COLM 2026; ICLR 2025, 2026; ICML 2026.