[Xapian-discuss] Omindex: what are the default numbered indexes?

James Aylett james-xapian at tartarus.org
Tue Apr 26 13:35:20 BST 2011


On 26 Apr 2011, at 13:12, <xapian at catcons.co.uk> <xapian at catcons.co.uk> wrote:

> How to make Omega CGI remove duplicate documents from its query output?

What you're looking for is called collapsing, which is where the matcher, when building a MSet (list of matching documents) will only include one document for each distinct value.

> Apparently scriptindex can be used to add numbered indexes via the
> INDEX_SCRIPT as documented at http://xapian.org/docs/omega/scriptindex.html.

They aren't numbered indexes, they're numbered values; you want value= or valuenumeric=. As well as collapsing, values can also be used for sorting and range searches.

> Using Omega to query an index built with omindex suggests there are some
> default numbered indexes.  Setting &COLLAPSE=<index number> in the URL
> (where <index number> was 1, 2 or 3) got a listing that seemed to have
> duplicates suppressed, the same number of documents for each of the three
> indexes.

Again, you mean "value" not "index", to avoid confusion.

> Before rolling this out to the users it would be nice to know what these
> default numbered indexes are and which, if any, can be safely used to
> suppress duplicates.

Within omega, values 0, 1 and 2 are reserved (for last modification time, 16 byte MD5 checksum and filesize in bytes, respectively). Anything other than that can be used. I'm not convinced this is documented anywhere useful; I've added a note to the missing documentation wiki page about this.

> Is there a way to interrogate the index database?  Are the default numbered
> indexes described in the documentation?


You can use delve to interrogate Xapian databases, such as:

$ delve -V -r <docid> <path-to-database>

which will display the values (and also the terms) for that document.

J

-- 
 James Aylett
 talktorex.co.uk - xapian.org - devfort.com




More information about the Xapian-discuss mailing list