Tools for Information Retrieval Experiments
A collection of tools I built and modified for use in IR experiments.
txt | trecbox | Lucene | Terrier
ap.tgz is a sample data set to test the tools with; its stats and results are shown below.
STATS
2250 Documents from the Associated Press (on TREC DISK 3).
20 Queries from TREC-4 (Query IDs 201-250).
167 Relevance judgments.
RESULTS
Lucne
---------------------------------
RUN NAME MAP
---------------------------------
DEMO.a.s.bm25.20.D.x 0.4814
DEMO.a.s.bm25L.20.D.x 0.4335
DEMO.a.s.bm25e.20.D.x 0.4766
DEMO.a.s.tmpl.20.D.x 0.2402
DEMO.a.s.tmple.20.D.x 0.2402
---------------------------------
Terrier
---------------------------------
RUN NAME MAP
---------------------------------
DEMO.a.s.bm25.20.D.x 0.4728
DEMO.a.s.tf_idf.20.D.x 0.4732
DEMO.a.s.tmpl.20.D.x 0.2141
---------------------------------
TERMINOLOGY
TREC - Stands for Text REtrieval Conference, a annual meeting of the IR community to study and evaluate retrieval methodologies. TREC creates and curates its own data sets. A sample if provided below.
RUN NAME - The name of the search result file produced as output by the search tools.
MAP - Mean Average Precision, a metric used to measure the quality of the search result.