[Xapian-discuss] What kind of data in the datafield

Richard Boulton richard at lemurconsulting.com
Thu Jan 4 11:48:32 GMT 2007


Felix Antonius Wilhelm Ostmann wrote:
> we are building the next google ... you know ;) But, what should we save 
> in the data-field?

It really depends what you want to do with the data.  In general, you 
should save what you have a use for, and no more: obviously, the less 
you save, the smaller the database, and the faster you'll be able to 
access the data.

If you have the original data on disk, it's often useful just to save a 
URL/file path to the data.  But, even in this case, if the data has to 
pass through an expensive parsing step to extract text, it may be useful 
to store a sample of the parsed text for display in the result list. 
You might even want to store the whole parsed text, and generate a 
summary based on the phrases relevant to the query.

> And the title, the timestamp and other stuff? save in a value or at the 
> data too? I am confused :(

Save in the data if you want to display them, or use them in some other 
way, once you've got the document results.

Note that if you're saving something like a timestamp in a value anyway 
(e.g., for sorting), you can just read the timestamp from the value when 
displaying the result list, so there's no need to duplicate this.

-- 
Richard



More information about the Xapian-discuss mailing list