[Xapian-discuss] [ NUMBER OF SAMPLE ]
Eric B. Ridge
ebr at tcdi.com
Wed Jul 21 17:38:11 BST 2004
On Jul 21, 2004, at 10:59 AM, Boris Meyer wrote:
<snip>
> The solution could be the retrieving of the words/phrases offset in
> the document and the extraction from this offset with a fork (x char
> before/x after) in combination with a document local weight algorythm
> if more than one match in the same document.
It sounds like you want some bit of context around the first hit. I
don't know if Omega can do this (doubt it, but I've never used Omega).
Personally, I'd like to see support for this in Xapian's API.
Right now one must re-parse the document, joining up with the terms
list from the result to find and highlight any/all hits, let alone
context extraction. A fairly expensive operation if you're doing to do
this on a "summary display" of many documents.
I think I suggested awhile back that Xapian be able to track byte
offsets for each term. This would make grabbing hit contexts really
simple. I know it would drastically increase the size of the index,
but I personally would be willing to take the storage hit.
eric
More information about the Xapian-discuss
mailing list