Lucene Mod

A mod of Apache Lucene (5.4.0) for processing TREC data. A handful of classes were extended to implement TF-IDF models and provide the facility to parse TREC test-collections. See the notes on using Lucene for IR experiments.

Download the code: lucene-mod-v0.11
Github repository: Lucene
Notes on Lucene: Markdown document

1 Hank Feild helped move it forward to v0.2 but I have not kept up. All documentation on this page adheres to v0.1.

Lucene-5.4.0 References

  1. org.apache.lucene.analysis.en; list of stemmers.
  2. org.apache.lucene.search.similarities; list of retrieval models.
  3. Lucene's scoring.
  4. NumericDocValue; The object that stores a per-document normalization factor.

Acknowledgment