[Xapian-discuss] some questions with scriptindex

Olly Betts olly at survex.com
Mon Mar 28 00:05:57 BST 2005


On Sun, Mar 27, 2005 at 07:45:21PM +0000, Sabrina Shen wrote:
> UID: field=UID boolean=XUID unique=Q 

Unless you're trying to do something clever, you want the same prefix for
boolean and unique.

> (1) Do I really need "field=" for each?

You do if you want Xapian to store them.

> Isn't "field=" just for displaying web search results (As Sam
> described in earlier messages: "Fields are used to retreive per-record
> text for summaries and things like for Omega." ) ?

There's nothing special about web search results.  Sam meant "for Omega"
simply as an example of a program which might use them.

> Can't I get these values with "get_document().get_data()" using
> MSetiterator in my local system even without "field="? say, output
> search results into a text file?

The document data is built from the values processed with "field=".  So
if you don't have a field action, the value won't be stored in the
document data.  Sometimes that's what you want...

> (2) How does "truncate=" work?

The "input field" from the dump file is fed through each action in turn.
The "truncate" action simply truncates the value to the given length, so
actions on the same line after the "truncate" see the truncated text.

> Does it work for both probabilistic field and BOOLEAN field?

For *ANY* action after it.

> Does it truncate each word while indexing, e.g. truncate a term
> if it's longer than 200 characters while indexing?

No - "index" after "truncate" means the text will be truncated before
word splitting.  But "index" will discard any word of more than 64
characters anyway.

> (3) In the indexing process, I got an error message as following: 
> "Exception: Key too long: length was 264 bytes, maximum length of a key is
> Btree::max_key_len bytes". I understand it means a single term is too
> long. But a term in which field: the primary field UID? or any field
> such as JN, CA, and AB?

It'll be in one of the boolean fields (unless you passed "index" a prefix
of 200 or so characters!)

This should be reported better.  We need to check term length explicitly
up front (at present this exception comes from a lower level which is
handling keys built from terms and document ids).

Cheers,
    Olly



More information about the Xapian-discuss mailing list