[Xapian-discuss] Searching subset of documents

Rusty Conover rconover at infogears.com
Thu Jun 1 19:23:25 BST 2006


On Jun 1, 2006, at 9:19 AM, Olly Betts wrote:

> On Thu, Jun 01, 2006 at 03:13:54AM -0600, Rusty Conover wrote:
>> The subset of documents to be searched is not nicely able to be
>> defined with boolean fields. Currently I'm running a query in an
>> external database which returns the Xapian document ids where that
>> the Xapian query should be matched.
>
> So is this the result of an SQL query?

In this case yes.

>
>> I've written code so that custom decider functions can be passed to
>> get_mset() in Search::Xapian, but doesn't appear to be able to do the
>> job.  Because the decider function isn't passed the document id, just
>> the document object itself.  I suppose this is because the document
>> id appears to be munged with the number of active databases currently
>> being searched, to ensure uniqueness across all databases.
>
> I think it's just an oversight that it doesn't get the docid.  If  
> you're
> searching multiple databases, it's easy enough to map the merged docid
> back to the database and docid it came from.  But this isn't the best
> approach for you I think, unless you're rejecting very few documents.
> The MatchDecider is assumed to be expensive and so is applied to as
> few documents as possible, hence as late in the matcher's  
> processing as
> possible.
>
>> Is there a more efficient way to go about this, where the document
>> list could be filtered before the term matcher goes to work?  Does it
>> really make a difference with regard to order?
>
> You really want to do this as early as you can (i.e. near the root of
> the query tree), to avoid having to read sections of postlist which  
> you
> aren't going to use (assuming the external source of docids can't do
> anything useful with a "skip_to").
>
> So you want to be able to have an ExternalSourcePostList which just  
> gets
> docids from some external source and then you can take the query do:
>
> enquire.set_query(Query(Query::OP_FILTER, query,  
> external_source_postlist));
>
> It'd be handy to have something like this available, and it's not too
> hard to implement.  I'm not likely to have time to look at it for a
> while, but I can point you in the right direction if you want to look.

I'd be happy to spend some time on this, if you have any hints about  
where to start they would be most appreciated.

Thanks much,

Rusty
--
Rusty Conover
InfoGears Inc.
Web: http://www.infogears.com






More information about the Xapian-discuss mailing list