Lucene Mod
A mod of Apache Lucene (5.4.0) for processing TREC data. A handful of classes were extended to implement TF-IDF models and provide the facility to parse TREC test-collections. See the notes on using Lucene for IR experiments.
Download the code: lucene-mod-v0.11
Github repository: Lucene
Notes on Lucene: Markdown document
1 Hank Feild helped move it forward to v0.2 but I have not kept up. All documentation on this page adheres to v0.1.
Lucene-5.4.0 References
- org.apache.lucene.analysis.en; list of stemmers.
- org.apache.lucene.search.similarities; list of retrieval models.
- Lucene's scoring.
- NumericDocValue; The object that stores a per-document normalization factor.
Acknowledgment
- Ian Soboroff; whose trec-demo I forked to create this mod.
- Hank Feild; for contributing many changes, shifting the build system from ant to maven and other corrections.