[Xapian-discuss] word context, numeric values, and characters

Peter Karman peter at peknet.com
Wed Dec 7 03:01:10 GMT 2005


I have a few assumptions about Xapian's features that I'm seeking confirmation 
about. I've read the API docs, some of the example .cc files, the mailing list 
and wiki and want to know if I'm understanding correctly.

1. contextual data

The convention for storing contextual information about words (i.e., what tag 
they appear in in HTML, or what field/column in a db) is to prefix the term with 
a string, and then map that string using add_prefix() or add_boolean_prefix() in 
constructing a query.

For example, the html "<title>foo</title>" could be indexed as "Tfoo". A query 
for "title:foo" would be parsed with add_prefix("title","T") and that would 
generate a match for "Tfoo".

Am I understanding that process correctly?


2. add_value() and set_data() require char* arguments; there is no support for 
an int or other numeric value. How then does sorting work for numeric values?


3. add_term() and add_posting() do not parse the passed char* string at all; it 
is indexes as-is. Any parsing (stemming, splitting into words on non-word 
characters) must happen before adding to the db. indextext.cc is one example in 
the omega package of text parsing prior to adding to the db.

Correct?

Thanks in advance for clarifying.

pek
-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com




More information about the Xapian-discuss mailing list