[Xapian-discuss] Text snippets

Rune Kock rune.kock at gmail.com
Fri Dec 25 09:16:48 GMT 2009


My point of view as a user:

Storage space is generally cheaper than processing power.  So store a
complete copy of the original text at indexing time, after the initial
processing, just to make it easier to get the snippets.  Or even
multiple copies at various stages of processing, if that can be of any
use.


Rune

On Thu, Dec 24, 2009 at 20:22, Do. <do1 at yandex.ru> wrote:
> If anybody is to implement it for Xapian, what is the best strategy?
>
> That is my guess:
> 1. User provide source text and parsed query, highlight prefix/suffix and count how much
>  snippets she need.
> 2. Text is parsed again splitting by words (like index_text do), stemming, etc.
> 3. It should know original word start and end position and how it's parsed (and stemmed).
> 4. Match parsed word against query. (Easy for everyting except phrase).
> There is another question what is best algo for choosing snippets.
> 5. For example we split source text by phrase, separated by point. If phrase
>  have matched words add it to the list with weight of how much words are matched.
> If there is more weighty sentence add it to the list, if not add if there is room. Highlighting can not
>  be stopped early becasue it is always possible to have more weighty sentence ahead.
> 6. Then highlight collected phrases by adding user specified prefix/suffix at remembered positions.
> Wow, that became really complicated. And this is not speed optimal algo, no help from indexes, for
>  example 10 of 1M file reparsing should really load cpu.
>
> Happy holidays.
>
> 24.12.09, 01:55, "Olly Betts" <olly at survex.com>:
>
>> On Thu, Dec 17, 2009 at 11:29:52AM +0300, Do. wrote:
>>  > Is there advancements in snippeting? (Besides what mentioned in the wiki.) I
>>  > think extracting snippets is clearly IR task. And I hope Xapian will provide
>>  > at least helpers to do that.
>>
>>  I agree that it is a feature which would fit well in Xapian, but nobody has
>>  yet implemented it.  I don't know of anybody currently working on it (and
>>  since nobody else has responded to your post, I guess nobody is).
>>
>>  There's a ticket in trac as well as the FAQ entry.  The FAQ entry had some
>>  rough edges (e.g. the sample thread it linked to wasn't about snippets at all)
>>  so I've overhauled it, and linked to the ticket as part of that:
>>
>>  http://trac.xapian.org/wiki/FAQ/Snippets
>>
>>  Cheers,
>>      Olly
>>
>>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



-- 
Q: How many surrealists does it take to screw in a lightbulb?
A: Two. One to hold the giraffe and the other to fill the bathtub with
brightly colored machine tools.



More information about the Xapian-discuss mailing list