Welcome
Welcome to our tutorial on Item Response Theory for Natural Language Processing!
This tutorial will introduce the NLP community to Item Response Theory (IRT). IRT is a method from the field of psychometrics for model and dataset assessment. IRT has been used for decades to build test sets for human subjects and estimate latent characteristics of dataset examples. Recently, there has been an uptick in work applying IRT to tasks in NLP. It is our goal to introduce the wider NLP community to IRT and show its benefits for a number of NLP tasks. From this tutorial, we hope to encourage wider adoption of IRT among NLP researchers.
As NLP models improve in performance and increase in complexity, new methods for evaluation are needed to appropriately evaluate performance improvements. In addition, data quality continues to be important. Models exploitation of annotation artifacts, annotation errors, and a misalignment between models and dataset difficulty can hinder an appropriate assessment of model performance. As models reach and exceed human performance on certain tasks, it gets more difficult to distinguish between improvements and innovations and changes in scores due to chance. In this three-hour, introductory tutorial, we will review the current state of evaluation in NLP, then introduce IRT as a tool for NLP researchers to use when evaluating their data and models. We will also introduce and demonstrate the py-irt Python package for IRT model-fitting to help encourage adoption and facilitate IRT use.
While this methodology has been applied successfully to NLP applications, further community exposure specifically for graduate students may provide a new methodological perspective. We aim to make the tutorial interactive with hands-on Jupyter notebooks which will give concrete simple examples.
Stay Connected
We’re building a list of individuals interested in working with/learning more about IRT in NLP.
Please fill out this form if you’d like to be part of the community.
Date and Place
The tutorial took place at EACL 2024 on March 21st, 2024.
Speakers
- John Lalor, University of Notre Dame
- Pedro Rodriguez, Meta AI-FAIR
- Joao Sedoc, New York University
- Jose Hernandez-Orallo, Universitat Politècnica de València and the Leverhulme Centre for the Future of Intelligence, University of Cambridge, UK
Schedule
- Evaluation in NLP
- Introduction to IRT
- Defining IRT Models
- IRT Model Fitting
- Introduction to py-irt
- IRT in NLP
- Building Test Sets
- Model Evaluation
- Chatbot Evaluation
- Training Dynamics
- Example Mining
- Curriculum Learning
- Model and Data Evaluation
- Rethinking Leaderboards
- Features Related to Difficulty
- Building Test Sets
- Advanced Topics and Opportunities for Future Work
Material
Tutorial Recording
Presentation Materials
- Introduction: slides
- Evaluation in NLP: slides
- Introduction to IRT: slides
- IRT in NLP: slides
- Advanced Topics: slides
- Conclusion and Opportunities for Future Work: slides
- Full tutorial (single pdf file): slides
We have also put together a structured reading list (pdf) for references.
Reference
If you build on top of this tutorial and want to cite it, please use the following bib entry:
@inproceedings{irt4nlp2024,
title = "Item Response Theory for Natural Language Processing",
author = "Lalor, John P. and Rodriguez, Pedro and Sedoc, Joao and Hernandez-Orallo, Jose",
booktitle = "Proceedings of the 18th Conference of the European
Chapter of the Association for Computational
Linguistics: Tutorial Abstracts",
month = march,
year = "2024",
address = "Malta",
publisher = "Association for Computational Linguistics",
abstract = "This tutorial will introduce the NLP community to
Item Response Theory (IRT). IRT is a method from
the field of psychometrics for model and dataset
assessment. IRT has been used for decades to build
test sets for human subjects and estimate latent
characteristics of dataset examples. Recently, there
has been an uptick in work applying IRT to tasks in
NLP. It is our goal to introduce the wider NLP
community to IRT and show its benefits for a number
of NLP tasks. From this tutorial, we hope to encourage
wider adoption of IRT among NLP researchers."
}