Assisting Authors to Convert Raw Products into Polished Prose

Year
2020
Volume 21
Issue 1
Pages
103-140
Authors
Takumi Ito, Tatsuki Kuribayashi, Hayato Kobayashi, Ana Brassard, Masato Hagiwara, Jun Suzuki, Kentaro Inui
Abstract
Being a notoriously complex problem, writing is generally decomposed into a series of subtasks: idea generation, expression, revision, etc. Given some goal, the author generates a set of ideas (brainstorming), which he integrates into some skeleton (outline, text plan, outline). This leads to a first draft which is submitted then for revision possibly yielding changes at various levels (content, structure, form). Having made a draft, authors usually revise, edit, and proofread their documents. We confine ourselves here only to academic writing, focusing on sentence production. While there has been quite some work on this topic, most writing assistance has mainly dealt with grammatical errors, editing and proofreading, the goal being the correction of surface-level problems such as typography, spelling, or grammatical errors.  We broaden the scope by also including cases where the entire sentence needs to be rewritten in order to express properly all of the information planned. Hence, Sentence-level Revision (SentRev) becomes part of our writing assistance task. Obviously, systems performing well in this task can be of considerable help for inexperienced authors by producing fluent, well-formed sentences based on the user’s drafts.  In order to evaluate our SentRev model, we have built a new, freely available crowdsourced evaluation dataset which consists of a set of incomplete sentences produced by nonnative writers paired with final version sentences extracted from published academic papers. We also used this dataset to establish baseline performance on SentRev.

Keywords: natural language processing, academic writing assistance, dataset creation, deep learning