[Xapian-discuss] [ NUMBER OF SAMPLE ]

Wed Jul 21 17:38:11 BST 2004

On Jul 21, 2004, at 10:59 AM, Boris Meyer wrote:

<snip>

> The solution could be the retrieving of the words/phrases offset in 
> the document and the extraction from this offset with a fork (x char 
> before/x after) in combination with a document local weight algorythm 
> if more than one match in the same document.

It sounds like you want some bit of context around the first hit.  I 
don't know if Omega can do this (doubt it, but I've never used Omega).  
Personally, I'd like to see support for this in Xapian's API.

Right now one must re-parse the document, joining up with the terms 
list from the result to find and highlight any/all hits, let alone 
context extraction.  A fairly expensive operation if you're doing to do 
this on a "summary display" of many documents.

I think I suggested awhile back that Xapian be able to track byte 
offsets for each term.  This would make grabbing hit contexts really 
simple.  I know it would drastically increase the size of the index, 
but I personally would be willing to take the storage hit.

eric