[Xapian-devel] Re: [Xapian-discuss] Searching subset of documents

Olly Betts olly at survex.com
Fri Jun 2 02:01:13 BST 2006


On Thu, Jun 01, 2006 at 06:11:04PM -0600, Rusty Conover wrote:
> Thanks for your help so far.  I'm a stuck on the problem of how to  
> properly expose this new class, called ExternalSourcePostList to  
> public users of the Xapian API.   I've created matcher/ 
> externalsourcepostlist.cc and matcher/externalsourcepostlist.h.
> 
> It seems the header "postlist.h" isn't installed when Xapian is  
> installed, so there needs to be some housekeeping allow this class to  
> be used but not show its internal bits.  This I'd like a little help  
> with.

The PostList class is an internal detail currently, and I think probably
should stay that way.  We do want to expose a similar interface to that
white PostList currently has, but we don't need all the methods of
PostList and it may be unhelpful to limit changes to PostList's
interface by exposing it directly.

Having had this circulating the recesses of my brain for a few hours, I
think the best way to fit this into the external API would probably
look something like this in use (names are the first that came to mind,
so could no doubt be improved upon):

    class MySQLFilter : public Xapian::ExternalPostingSource {
	    // any private SQL-related data
	public:
	    // ctor
	    // dtor
	    // size reporting methods
	    // next()
	    // skip_to()
	    // at_end()
	    // optionally weight/max_weight, defaulting to unweighted
    };

Then:

    Xapian::QueryParser qp;
    // configure qp
    Xapian::Query query = qp.parse_query(query_string);
    Xapian::Query sql_filter(new MySQLFilter(/* some parameters */));
    query = Xapian::Query(OP_FILTER, query, sql_filter);

And then use query as usual...

> I've attached my source files for the new class to this email.  I've  
> kept it really simple, just expecting an array of Xapian::docids to  
> be passed to the constructor, where they are copied and sorted, and  
> the rest of the iterator functions are implemented correctly as far  
> as I can tell.

Not a bad way to go for testing, but for real world use sucking
everything into an array doesn't scale so well, and sorting scales even
worse.  I'd try to arrange that the ids come out of SQL sorted, and
stream them through the "MySQLFilter" class.  The matcher may be able to
terminate early in which case you'll never need the tail end of the ids
from SQL (sorry if this is obvious - a lot of people don't seem to
appreciate this trick is even possible!)

Cheers,
    Olly



More information about the Xapian-devel mailing list