[Xapian-discuss] [ NUMBER OF SAMPLE ]
Boris Meyer
boris.meyer at rom.fr
Wed Jul 21 17:59:26 BST 2004
Hello Eric, Hello Richard,
Eric B. Ridge wrote:
> On Jul 21, 2004, at 10:59 AM, Boris Meyer wrote:
>
> <snip>
>
>> The solution could be the retrieving of the words/phrases offset in
>> the document and the extraction from this offset with a fork (x char
>> before/x after) in combination with a document local weight algorythm
>> if more than one match in the same document.
>
> It sounds like you want some bit of context around the first hit.
Exactly. More precisely a meaningful return result.
> don't know if Omega can do this (doubt it, but I've never used Omega).
> Personally, I'd like to see support for this in Xapian's API.
I'm diving into the Api, looking for some methods to retrieve this offset.
> Right now one must re-parse the document, joining up with the terms list
> from the result to find and highlight any/all hits, let alone context
> extraction. A fairly expensive operation if you're doing to do this on
> a "summary display" of many documents.
Yes a very consuming process, especially when the average size of the
documents I would have to parse is known, 3Mo (Pdf), don't forget the
x10 results/page please ;-).
> I think I suggested awhile back that Xapian be able to track byte
> offsets for each term. This would make grabbing hit contexts really
> simple. I know it would drastically increase the size of the index, but
> I personally would be willing to take the storage hit.
As HD are now low cost and as everybody today is looking for a google
meaninful result listing with highlighted terms, I would also store a
such index. But maybe is there another way ?
> eric
--
Cordialement, Boris.
+---------------------------+----------------------+
| Boris Meyer | Tel : 04 93 92 88 88 |
| Administration / Internet | Fax : 04 93 92 18 93 |
| Developpement | Web : http://rom.fr |
+---------------------------+----------------------+
| 19, bd Carabacel | - - - - - x - - - - |
| 06000 Nice | - - - - - x - - - - |
+---------------------------+----------------------+
| boris.meyer at rom.fr | http://www.rom.fr |
+---------------------------+----------------------+
More information about the Xapian-discuss
mailing list