Weighting Schemes: Evaluation results

Sun Jul 24 15:11:32 BST 2016

On Sun, Jul 24, 2016 at 04:47:15PM +0530, Vivek Pal wrote:

> I have evaluated new weighting schemes along with their existing
> counterparts in Xapian to compare and see which one does better job.
> Also, I have put together all the results files for easy access here:
> https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run

We probably don't want them committed in git where they're evaluation
runs (because we can recreate them); a gist might be more appropriate.

I can't tell, but are some of those files from FIRE? If so, they
shouldn't be committed either; access to FIRE is via our usage
agreement, and shouldn't be just public on the internet
anywhere. (Unless it's just files that FIRE themselves make completely
public, but even then it's better to link to them.)

> and a README for getting started with xapian-evaluation module. Hopefully,
> it might be of help to those who are new to evaluating weighting schemes in
> Xapian :)

In your instructions:

$ mv xapian-evaluation /path/to/xapian && cd xapian && nano edit
bootstrap

Is there time in your schedule to get evaluation into the main xapian
repo? That would avoid the first part of this. I don't think we're
looking at lots more work to get this done, are we?

You don't need to edit bootstrap; instead you can pass a list of
modules for it to bootstrap on the command line:

$ ./bootstrap xapian-core xapian-evaluation

> Comparing the MAP to access the retrieval effectiveness, some interesting
> results have emerged as follows:

Can you remind me what sort of corpus you're using from FIRE for this?
I want to get an idea of what kinds of use cases it might match. When
we're recommending weighting schemes to users, ideally we'd be able to
do this.

> 1. BM25+ : 0.100415 and BM25: 0.101771
> 
> BM25 does a slightly better job here. My guess is that BM25+ is falling
> short because may be we lack very long documents in the data-set
> collection.

Do you have any idea what 'very long' means in this case, in terms of
number of terms (or maybe multiple of mean terms)?

> 2. PL2+:  0.0781953 and PL2: 0.0752646
> 
> Here, PL2+ indeed does a better job at retrieving relevant documents
> although by a small margin.

Great!

> 3. LMWeight_Dirplus: 0.100168 and LMWeight_Dir: 0.100168
> 
> These results are for LMWeight with smoothing Dir and Dirplus respectively.
> Interestingly identical results.
> Ideally, LMWeight_dirplus should perform better and I'm having similar
> thoughts for it as for BM25+ and BM25 results.

That sounds more like the impact of the smoothing option is limited in
this run. Is this pure Dirichlet, or two-stage smoothing using Dir+
versus Dir? What smoothing parameters were you using?

> Last addtion in weighting schemes (Piv+ normalization) is a work in
> progress.  I've been sick these past few days and so things moved
> slowly. Will be completing its implementation in the upcoming week
> along with the evaluation.

Sorry you've been sick; make sure you're fully recovered before diving
back in full throttle!

Thanks for the (detailed!) update :)

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org