[Xapian-discuss] Particular Informations about Xapian

Olly Betts olly at survex.com
Fri Apr 6 10:23:03 BST 2007


On Thu, Mar 01, 2007 at 11:16:53AM +0100, Sandri Francesco wrote:
> Greetings, we are a group of students attending a course of Information 
> Retrival at the University of Padua. We are interested to have many 
> informations about Xapian:
> - the posting file data structure;

For the flint backend, see:

http://wiki.xapian.org/FlintBackend

The Btree tables used by flint are very similar to those in quartz (the
keys are different, and the filenames use "." instead of "_"):

http://www.xapian.org/docs/quartzdesign.html

> - if there is any index compression and what type;

Yes - see the above documentation.

> - what are standard ID schemes (DOI, URI, Purl, etc.);

You can use whatever you like as an ID, provided it's not overly long
(the limit is 240 bytes or so).

> - if Xapian builds any authority files;

Not by itself, though you should be able to build and maintain authority
files using Xapian.

> - if the system manage term polysemy (lexical ambiguity);

Not directly.

However, relevance feedback can be used to "steer" a query towards a
particular meaning of a term with multiple meanings though.  For
example, a search for "stock" can turn up investments, cookery,
warehouses, and so on.  If you mark a few documents in the results which
are relevant, Xapian can suggest more terms (so for investments, it
might suggest "shares" or "market", for cookery it might suggest
"recipe", etc).  Alternatively, you can get Xapian to suggests terms
based on the top N results, and then the user can pick from those terms.

Cheers,
    Olly



More information about the Xapian-discuss mailing list