[Xapian-devel] Initial patch for ExternalPostList

Olly Betts olly at survex.com
Sat Jun 3 17:09:12 BST 2006


On Sat, Jun 03, 2006 at 12:19:39AM -0600, Rusty Conover wrote:
> On Jun 2, 2006, at 10:34 PM, Alexander Lind wrote:
> >Pardon my ignorance, but can you explain a little of what one can
> >do with your addition here?  I mean what kind of functionality does
> >it add to Xapian?
> 
> The patch allows you to provide a source of xapian doc ids from an
> external source, in my case I use a SQL database, and make it part of
> a query's so that the documents returned will be required to be a
> member of that source.  It allows you to search a subset of a xapian
> database pretty easily.

And this is an idea that's been around for a while, for example:

http://thread.gmane.org/gmane.comp.search.xapian.general/230/focus=232

I'm sure there are many uses.  Here are a few which come to mind to
give you an idea:

* Restricting search results to those the user has permission to see.
  This can already be done by creating filter terms in Xapian, but
  updating the permissions in Xapian to reflect changes in the
  underlying system can be tricky.  If there's no a hook to tell
  you premissions have changed, all you can really do is perform a full
  sweep to check periodically and people generally prefer permission
  changes to be reflected right away.

* A usenet server could allow a search to be restricted to only those
  articles a user has already read (by filtering based on article
  numbers from the .newsrc) - this information is very dynamic and
  different for each user, so it's hard to achieve this currently.

* Sometimes sites want to be able to quickly remove pages, including
  from the search (for legal reasons perhaps).  This class would allow
  entries to be instantly made invisible to searches without
  complicating the standard update process.  At a convenient point, you
  can drop the entries from the database and remove them from the
  external filter.

* If your documents are added in date order, you can achieve "sort by
  date" very cheaply by using BoolWeight and Enquire::set_docid_order.
  With a simple external map from date to docid, you could use this
  class to implement a similarly cheap "filter by date" too.

* If the external source can set weight information, this new class
  will actually provide a full implementation of the MatchBiasFunctor
  idea which currently is only present as a proof-of-concept.  This
  would allow you to add an extra weight to each document - for example
  in a news search you might want to give a small weight boost to newer
  articles, or you could add a weight contribution based on link
  analysis, click-through rates, etc.

Cheers,
    Olly



More information about the Xapian-devel mailing list