[Xapian-devel] Initial patch for ExternalPostList
Olly Betts
olly at survex.com
Sat Jun 3 17:09:12 BST 2006
On Sat, Jun 03, 2006 at 12:19:39AM -0600, Rusty Conover wrote:
> On Jun 2, 2006, at 10:34 PM, Alexander Lind wrote:
> >Pardon my ignorance, but can you explain a little of what one can
> >do with your addition here? I mean what kind of functionality does
> >it add to Xapian?
>
> The patch allows you to provide a source of xapian doc ids from an
> external source, in my case I use a SQL database, and make it part of
> a query's so that the documents returned will be required to be a
> member of that source. It allows you to search a subset of a xapian
> database pretty easily.
And this is an idea that's been around for a while, for example:
http://thread.gmane.org/gmane.comp.search.xapian.general/230/focus=232
I'm sure there are many uses. Here are a few which come to mind to
give you an idea:
* Restricting search results to those the user has permission to see.
This can already be done by creating filter terms in Xapian, but
updating the permissions in Xapian to reflect changes in the
underlying system can be tricky. If there's no a hook to tell
you premissions have changed, all you can really do is perform a full
sweep to check periodically and people generally prefer permission
changes to be reflected right away.
* A usenet server could allow a search to be restricted to only those
articles a user has already read (by filtering based on article
numbers from the .newsrc) - this information is very dynamic and
different for each user, so it's hard to achieve this currently.
* Sometimes sites want to be able to quickly remove pages, including
from the search (for legal reasons perhaps). This class would allow
entries to be instantly made invisible to searches without
complicating the standard update process. At a convenient point, you
can drop the entries from the database and remove them from the
external filter.
* If your documents are added in date order, you can achieve "sort by
date" very cheaply by using BoolWeight and Enquire::set_docid_order.
With a simple external map from date to docid, you could use this
class to implement a similarly cheap "filter by date" too.
* If the external source can set weight information, this new class
will actually provide a full implementation of the MatchBiasFunctor
idea which currently is only present as a proof-of-concept. This
would allow you to add an extra weight to each document - for example
in a news search you might want to give a small weight boost to newer
articles, or you could add a weight contribution based on link
analysis, click-through rates, etc.
Cheers,
Olly
More information about the Xapian-devel
mailing list