[Xapian-discuss] Random ordering from Python

Olly Betts olly at survex.com
Thu Jan 22 00:50:00 GMT 2009

On Wed, Jan 21, 2009 at 11:34:54PM +0100, amix wrote:
> I have tried to implement my own random weight, but that did not work
> out. I would also like this random sorting to perform good and work on
> big result sets.

Implementing a random weighting scheme in Python should be possible,
though the overhead of the callbacks might be an issue if you're working
with a lot of data (I've never profiled, but it's a potential issue as
there's at least one per query term per matching document).

If you're happy using SVN trunk, then BoolWeight plus a PostingSource
which returns a random weight boost between 0 and some fixed value
should do the job.  That's one callback per matching document, which
is better for long queries.

> Is this possible (I would really like to see some example code if it's
> possible :-)). I am using Xapian from Python (which probably makes
> things harder).

There's no existing example, and I don't have the time to write one
at present (sorry).

> I could do random selects easily if counts were exact counts and not
> estimates - so returning exact counts would also solve my problem. I
> need performance thought, so setting check_at_least to 1 million is
> not a solution (unless it performs really good).

It's probably worth investigating.  High check_at_least prevents various
terminate early optimisations, but then it seems to me that so will
anything which is picking random matches.  This would also avoid calling
back to Python code.


