[Xapian-discuss] Grouping results

Torsten Bronger bronger at physik.rwth-aachen.de
Fri Oct 16 08:19:53 BST 2009


Hallöchen!

Olly Betts writes:

> On Wed, Oct 14, 2009 at 08:20:23AM +0200, Torsten Bronger wrote:
>
>> I use Xapian to index a lot of PDF files.  My current approach is
>> to have every PDF page as a Xapian document, so that I can report
>> the page number to the user via document.get_value(0).  This
>> works well.
>
> You shouldn't really use a document value for this - values are
> intended to be used during the match process itself (for sorting,
> collapsing, value ranges, MatchDecider, etc), and are stored to
> make that work well.  If you want something for showing results,
> the document data is a better option.

Okay, then I will do so.  Document data already contains the full
text of the PDF page so that I can display context in the search
results, but I can use object serialisation to put more than one
thing in "document data" of course.

>> However, it's not so nice that the pages of a certain PDF file
>> are spread over the whole hits list.  I could tell Xapian to
>> report very many (maybe even all) hits to me so I could group
>> them by PDF file in the main program.  But possibly someone here
>> has a more elegant solution?
>
> See Enquire::set_collapse_key():
>
> http://xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#f32055d3a4da31da994d97171f45d699
>
> 1.0.x only allows you to leave a single entry with each key, but
> in 1.1.x you can collapse to leave up to a specified number for
> each key.

But then the collapsed hits are completely lost rather than grouped.
I could make another search for every actually displayed hit for all
other pages in the PDF.  The idea is to have a hit entry like this
one:

    1. Exponential band tails in polycrystalline semiconductor films

       pages: 1, 4-6, 50, 69

Tschö,
Torsten.

-- 
Torsten Bronger, aquisgrana, europa vetus
                   Jabber ID: torsten.bronger at jabber.rwth-aachen.de
                                  or http://bronger-jmp.appspot.com




More information about the Xapian-discuss mailing list