[Xapian-discuss] query time stemming and term weights

Fri Nov 25 01:14:41 GMT 2005

On Thu, Nov 17, 2005 at 06:43:23PM +0100, Jean-Francois Dockes wrote:
> Olly Betts writes:
>  > So my suggestion would be to do some tests and see if retrieval
>  > effectiveness is actually made better/worse or left unchanged by
>  > stemming at search vs index time.  I'd definitely be interested to hear
>  > the results of any such tests.
> 
> Ok, so I understand that the issue is not obvious, I'll keep watching for
> weird behaviour. For a more systematic approach I'd actually be at a loss
> about how to test and what to look for.

Systematic evaluations usually look at precision and recall.  There's
a tension between the two which can be visualised as a curve on a graph.
C.J. van Rijsbergen's book "Information Retrieval" gives a good overview
in chapter 7, and you can read it online as HTML or PDF:

http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html

http://www.dcs.gla.ac.uk/Keith/pdf/Chapter7.pdf

It's easy enough to measure the precision and recall, then plot and
interpret the curves.  Counting up the documents in each category to
get the precision and recall figures is likely to make it moderately
time consuming though.

Cheers,
    Olly