Terrier Mod

A mod of Terrier-4.0 with some additions and modification for doing IR experiments using TREC data. These notes supplement the existing documentation and clarify a few things.

More often than not, search engines like these are used as black-boxes in experiments, and the lack of documentation describing the system-internals makes it hard to interpret the results or debug experiments. The collected notes here is an attempt to look under the hood and help the experimenter be a more informed user of this tool.

Download the code: terrier-mod-v1.0
Github repository: Terrier
Notes on Terrier: Markdown document

Terrier-4.0 References

  1. Settings for indexing TREC CD 1 & 2.
  2. Settings for indexing TREC CD 4 & 5.
  3. Settings recommended for indexing all text within the DOC tag of a TREC document. See the Javadoc comment block preceding the 'TagSet' class definition.
  4. Stop word list.
  5. Stemmer implementations available.
  6. S-Stemmer implementation.
  7. A vague term frequency normalization constant mentioned in the 'Weighting Models and Parameters' section.
  8. org.terrier.matching.models; Terrier-4.0 model list.