[Xapian-discuss] What kind of data in the datafield

James Aylett james-xapian at tartarus.org
Thu Jan 4 13:05:43 GMT 2007


On Thu, Jan 04, 2007 at 12:15:32PM +0100, Felix Antonius Wilhelm Ostmann wrote:

> we are building the next google ... you know ;) But, what should we save 
> in the data-field?
> 
> the hole content? the first 4.096 byte from one dokument? the best 400 
> byte from one dokument? or nothing and save the content raw to disk in a 
> file named by the doc_id?
> 
> And the title, the timestamp and other stuff? save in a value or at the 
> data too? I am confused :(

It depends entirely on how you want to display the data. Google (I
believe) keeps copies of everything, so you ideally want the source
document somewhere. I'd probably recommend having Xapian document data
containing some summary fields plus a key to the storage on disk (or,
as you suggest, use the doc_id), so that overview search results pages
can be built without loading vast quantities of raw data and
on-the-spot summarising them, but still giving you the opportunity of
doing more detailed work (full-document search result highlighting,
for instance) when required.

Speaking of which, has anyone else noticed some sites doing search
result highlight when driven from natural search? Not sure I'm in
favour of it - strikes me that it could be done better as a browser
extension - but still interesting.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list