CSniper (Corpus Sniper) is a tool that implements

  1. a web-based multi-user scenario for identifying and annotating linguistic phenomena (e.g. non-canonical grammatical constructions) in large corpora, based on linguistic search queries and
  2. evaluation of annotation quality by measuring inter-rater agreement.

This annotation-by-query approach efficiently harnesses expert knowledge to identify instances of linguistic phenomena that are hard to identify by means of existing purely automatic annotation tools. In addition, CSniper uses inbuilt machine learning mechanism (using an SVM with tree-kernel) to rank search results, facilitating the annotation process.

Sentence based Annotation in CSniper

How to cite

If you use CSniper in scientific work, please cite

Eckart de Castilho, R., Bartsch, S., and Gurevych, I. (2012). CSniper - annotation-by-query for non-canonical constructions in large corpora. In Proceedings of the ACL 2012 System Demonstrations, pages 85–90, Jeju Island, Korea. Association for Computational Linguistics. (pdf) (bib)

If you are referring to the automatic annotation capabilities, please cite CSniper as:

Do Dinh, E. and Eckart de Castilho, R. and Gurevych, I. (2015). In-tool Learning for Selective Manual Annotation in Large Corpora. In Proceedings of the ACL 2015 System Demonstrations, to be published, Beijing, China. Association for Computational Linguistics. (pdf) (bib)


CSniper is licensed under the Apache Software License (ASL) version 2.

About CSniper

This project is being developed by the Ubiquitous Knowledge Processing Lab (UKP) at the Technische Universität Darmstadt, Germany under the auspices of Prof. Iryna Gurevych.