[Xapian-discuss] Grouping results
Torsten Bronger
bronger at physik.rwth-aachen.de
Fri Oct 16 08:19:53 BST 2009
Hallöchen!
Olly Betts writes:
> On Wed, Oct 14, 2009 at 08:20:23AM +0200, Torsten Bronger wrote:
>
>> I use Xapian to index a lot of PDF files. My current approach is
>> to have every PDF page as a Xapian document, so that I can report
>> the page number to the user via document.get_value(0). This
>> works well.
>
> You shouldn't really use a document value for this - values are
> intended to be used during the match process itself (for sorting,
> collapsing, value ranges, MatchDecider, etc), and are stored to
> make that work well. If you want something for showing results,
> the document data is a better option.
Okay, then I will do so. Document data already contains the full
text of the PDF page so that I can display context in the search
results, but I can use object serialisation to put more than one
thing in "document data" of course.
>> However, it's not so nice that the pages of a certain PDF file
>> are spread over the whole hits list. I could tell Xapian to
>> report very many (maybe even all) hits to me so I could group
>> them by PDF file in the main program. But possibly someone here
>> has a more elegant solution?
>
> See Enquire::set_collapse_key():
>
> http://xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#f32055d3a4da31da994d97171f45d699
>
> 1.0.x only allows you to leave a single entry with each key, but
> in 1.1.x you can collapse to leave up to a specified number for
> each key.
But then the collapsed hits are completely lost rather than grouped.
I could make another search for every actually displayed hit for all
other pages in the PDF. The idea is to have a hit entry like this
one:
1. Exponential band tails in polycrystalline semiconductor films
pages: 1, 4-6, 50, 69
Tschö,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus
Jabber ID: torsten.bronger at jabber.rwth-aachen.de
or http://bronger-jmp.appspot.com
More information about the Xapian-discuss
mailing list