[Xapian-discuss] Future of Xapian (long)

Sam Liddicott sam@liddicott.com
Thu, 24 Jun 2004 22:51:26 +0100


----- Original Message ----- 
From: "Olly Betts" <olly@survex.com>
To: <xapian-discuss@lists.xapian.org>
Sent: Tuesday, June 22, 2004 5:11 PM
Subject: Re: [Xapian-discuss] Future of Xapian (long)


> But sort-bands split mset entries into a number (say 5) bands using
> these percentages.  Within each band, documents are sorted by date
> (or whetever).
>
> So I get the most recent document which scored 80-100% down to the least
> recent which scores this.  Then the most recent which scored 60-80%, down
> to the oldest which scores this.  And so on.  So recent documents are
> scattered throughout the hitlist, and high scoring documents are fairly
> randomly moved around.
>
> It's so ass-backwards.  I could see an argument for sorting into day
> (or week/month/year or perhaps even hour depending on application)
> *then* by percentage score within each day.  That would make a lot of
> sense for a news site.  And a scheme for giving additional weight to
> recent documents - often a recent document is inherently more likely to
> be relevant.  And even more often, give two otherwise equally relevant
> documents, you'd prefer the more recent one.
>
> Is anyone actually currently using sort-bands?  Does it actually do what
> you want, or would you prefer some other scheme?

While sort bands will always be technically unsatisfactory they are a
shortcut to a users problem in a better way than the user could manage
un-aided.

When I search news or dated documents I often want "recent relevant stuff";
now this can be done with a strict cut off - say 80% and then date sorting.
- but what if I dont find what I'm looking for? Rather than keep repeating
the search with lower and lower cut-offs, banding does this for me.

Much as its impossible for the computer to know what I want and somehow
present a 2 dimensional sort (date and relevance) a a linear result set, it
does a lot better than I can do with a comprehensive result set sorted by
relevance alone.

I dont mind if banding is got rid of, but the user problem remains and it
would be good to be able to present some solution to it.

Perhaps as I go back in time with date sorted results within the relevance
cutoff I could have an option for: repeat search with lower relevance,
applying a max and min relavavence to form a window.  It may be better for
the button to skip straight to this time period in a lower relevance band
and I can then step forwards or backwards in time; so instead of navigating
a 1 dimensional sorted list I have better control over navigating a 2
dimensional result set than banding currently gives.

Sam