The Natural Language Processing Lab

View the Project on GitHub dcavar/

Ellipsis and Elided Elements in Natural Language: The Hoosier Ellipsis Corpus

Created: Damir Cavar, 2023-06-07

Last change: Damir Cavar, 2023-06-08

Ellipsis and other phenomena where words in sentences and utterances are elided or omitted are extremely interesting from a theoretical linguistic and cognitive language faculty perspective. In general, we recommend looking at The Oxford Handbook of Ellipsis and the numerous research articles, books, and dissertations discussed in the different sections of the handbook. There are also highly relevant articles mentioned below in the publications and on the websites from the various ellipsis corpus projects mentioned below.

There are various reasons why we are working on ellipsis and other word-omitting phenomena. Some of those are:

We will provide research reports here in the near future with quantified data related to these claims. These are strong claims, but our experience has shown that the limited use of phrase structure and dependency parsers significantly relates to the failure to process Dark Matter in Language. While certainly semantic and pragmatic approaches could be tried to reconstruct omitted linguistic content, we focus on syntactic and pattern-based methods with neural and symbolic algorithms, modeling the fast and slow processing of the human language faculty when it comes to elided linguistic content.

Our goals are ambitious:

And, no, ChatGPT is not the solution here…

Online Resources

We identified the following resources online:

If you have more links or if you want to share your data sets, please send us a note, damir at

Hoosier Ellipsis Corpus