NLP's Clever Hans Moment Has Arrived

Year
2020
Volume 21
Issue 1
Pages
161-170
Authors
Benjamin Heinzerling
Abstract
Large, pretrained language models have led to a flurry of new state-of-the-art results being reported in many areas of natural language processing. However, recent work has also shown that such models tend to solve language tasks by relying on superficial cues found in benchmark datasets, instead of acquiring the capabilities envisioned by the task designers. In this short opinion piece, I review a report by Niven & Kao (2019) of this so-called Clever Hans effect on an argument reasoning task and discuss possible solutions for its prevention.

Keywords: NLP, language models, superficial cues, Clever Hans, datasets