publications | Ivan Lee

2022

NAACL
Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context

Daniel Spokoyny, Ivan Lee, Zhao Jin, and 1 more author

In Findings of the Association for Computational Linguistics: NAACL 2022 Jul 2022

Abs Bib

Physical measurements constitute a large portion of numbers in academic papers, engineering reports, and web tables. Current benchmarks fall short of properly evaluating numeracy of pretrained language models on measurements, hindering research on developing new methods and applying them to numerical tasks. To that end, we introduce a novel task, Masked Measurement Prediction (MMP), where a model learns to reconstruct a number together with its associated unit given masked text. MMP is useful for both training new numerically informed models as well as evaluating numeracy of existing systems. To address this task, we introduce a new Generative Masked Measurement (GeMM) model that jointly learns to predict numbers along with their units. We perform fine-grained analyses comparing our model with various ablations and baselines. We use linear probing of traditional pretrained transformer models (RoBERTa) to show that they significantly underperform jointly trained number-unit models, highlighting the difficulty of this new task and the benefits of our proposed pretraining approach. We hope this framework accelerates the progress towards building more robust numerical reasoning systems in the future.
@inproceedings{spokoyny-etal-2022-masked, title = {Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context}, author = {Spokoyny, Daniel and Lee, Ivan and Jin, Zhao and Berg-Kirkpatrick, Taylor}, booktitle = {Findings of the Association for Computational Linguistics: NAACL 2022}, month = jul, year = {2022}, address = {Seattle, United States}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.findings-naacl.2}, doi = {10.18653/v1/2022.findings-naacl.2}, pages = {17--29}, }
EMNLP
HeLo: Learning-Free Lookahead Decoding for Conversation Infilling

Ivan Lee, and Taylor Berg-Kirkpatrick

In Findings of the Association for Computational Linguistics: EMNLP 2022 Dec 2022

Abs Bib

We propose Heuristic Guided Lookahead Decoding (HeLo), a novel decoding strategy for conversation infilling. Conversation infilling aims to generate a seamless bridge of utterances connecting a given pair of source and target utterances. HeLo does not require fine-tuning or extra models – only the generating model itself. Instead, HeLo leverages a greedy lookahead phase before committing to any token. The HeLo framework is simple and can augment conventional decoding strategies paired with any autoregressive language model. Smooth transitions between utterances are encouraged with an annealing schedule. Our experiments show HeLo outperforms several baselines when evaluated with both automatic and human evaluation metrics, which, we argue, are appropriate for the task.
@inproceedings{lee-berg-kirkpatrick-2022-helo, title = {{H}e{L}o: Learning-Free Lookahead Decoding for Conversation Infilling}, author = {Lee, Ivan and Berg-Kirkpatrick, Taylor}, editor = {Goldberg, Yoav and Kozareva, Zornitsa and Zhang, Yue}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2022}, month = dec, year = {2022}, address = {Abu Dhabi, United Arab Emirates}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.findings-emnlp.367}, doi = {10.18653/v1/2022.findings-emnlp.367}, pages = {4996--5008}, }

2024

ICLR
Exploring the Relationship Between Model Architecture and In-Context Learning Ability

Ivan Lee, Nan Jiang, and Taylor Berg-Kirkpatrick

Dec 2024

Abs Bib

What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps toward answering this question. We evaluate twelve model architectures capable of causal language modeling across a suite of synthetic in-context learning tasks. These selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, state-space model inspired, and other emerging attention alternatives. We discover that all the considered architectures can perform in-context learning under a wider range of conditions than previously documented. Additionally, we observe stark differences in statistical efficiency and consistency by varying context length and task difficulty. We also measure each architecture’s predisposition towards in-context learning when presented with alternative routes for task resolution. Finally, and somewhat surprisingly, we find that several attention alternatives are more robust in-context learners than transformers. Given that such approaches have constant-sized memory footprints at inference time, this result opens the possibility of scaling up in-context learning to accommodate vastly larger numbers of in-context examples.
@misc{lee2023exploring, title = {Exploring the Relationship Between Model Architecture and In-Context Learning Ability}, author = {Lee, Ivan and Jiang, Nan and Berg-Kirkpatrick, Taylor}, year = {2024}, eprint = {2310.08049}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, publisher = {International Conference on Learning Representations} }