[Xapian-discuss] Quickest way to retrieve data for a large match set?

William Crawford william at sciencephoto.co.uk
Thu Jun 24 12:55:09 BST 2010


We're using the Perl binding to access Xapian in a simple search of image 
metadata (title and keywords). Due to the specification for the search engine, 
by default we have to sort the results using a function of the search rank, 
age (well, newness) and popularity (rated by sales of the image). As a result, 
we have to fetch the complete result set and then calculate a new ranking 
based on the original rank, perturbed using the ratios of each of the newness 
and popularity to the highest values in the result set (i.e. there is no way 
to precalculate these at indexing time, alas).

Currently fetching the document data for the results has become something of a 
bottleneck (typical searches my generate 50 - 500 matches, but some return 
more than 5000).

Code is something like:

...
    print STDERR "Query = ", $q->get_description, "\n" if $self->debug;
    my $e = $self->index->enquire ($q);
    #my $hits = $e->get_mset(0, $self->index->get_doccount, $self->index-
>get_doccount);
    my (@hits) = $e->matches (0, $self->index->get_doccount, $self->index-
>get_doccount);
    my (@results) = map +thaw($_->get_document->get_data), @hits;
    return \@results;
}

I'd like to know if there's anything I can do to improve the speed of fetching 
the results (in other words, am I doing it wrong)?



More information about the Xapian-discuss mailing list