[Xapian-discuss] Omega: behavior msize when collapsing results

Jeroen Vaes Jeroen.Vaes at vdp.com
Wed Mar 24 15:48:41 GMT 2010


Hello list,

I have a problem with the value of the result size ($msize in
omegascript) when collapsing results. The index contains 151452
documents. I'm using Omega 1.0.18 on FreeBSD (I tried both the version
in ports and the latest one from xapian.org). This is my indexscript:

uniqueid:		boolean=Q unique=Q field=uniqueid
objectid:		field=objectid boolean=XID value=0
objecttype:		field=type boolean=XTYPE
language:		field=language boolean=L
title:		field=title index
content:		index
catalog:		field=catalog boolean=XCATALOG
number:		field=number
searchnumber:	field=searchnumber boolean=XNUMBER indexnopos
productgroup:	field=productgroup boolean=XPRODUCTGROUP
property:		field=property boolean=XPROPERTY
colour:		field=colour boolean=XCOLOUR
size:			field=size boolean=XSIZE
colourandsize:	field=size boolean=XCOLOURANDSIZE
norm:			field=norm boolean=XNORM
picture:		field=picture
sort:			valuenumeric=1 field=sort
icon:			field=icon boolean=XICON
preview:		unhtml truncate=200 field=preview

My Omega command looks like this:

FMT=xml DEFAULTOP=OR HITSPERPAGE=9 MINHITS=900 SORT=1 SORTREVERSE=1
COLLAPSE=0 B=LNL B=XTYPEproduct P='(catalog:2 OR catalog:425) AND
productgroup:6'

So in plain English, I am requesting all products from catalogs 2 and
425 in productgroup 6. I am collapsing the result on field 'objectid'
and sorting on field 'sort'. The expected number of results is 418,
without collapsing this is 441. In order to get an exact number of
matches, I set the MINHITS parameter to 900.

However, when I run this query, $msize is 439. The value of $msizeexact
is "true", so it appears to be not estimated. However, when I request
the last result page, the value of $msize is reduced to 418.

Now, when I set HITSPERPAGE to 1, the value of $msize is 441 (so the
number of documents before collapsing). Again, when requesting the last
result page, the value of $msize is 418. And again, in both cases the
value of $msizeexact is "true". When I set HITSPERPAGE to 1000, the
value of $msize is 418.

So, it would seem that $msize does not take into account the collapsing
of documents. However, I did some digging in the Omega code, and it
seems $msize is the value of Xapian::MSet::get_matches_estimated(), and
according to the API documentation, "This figure takes into account
collapsing of duplicates, and weighting cutoff values.". I also have a
smaller index (83937 documents) which uses the same script and the same
kind of data, and there $msize is always correct.

So, what causes this behavior? Is this correct (in that case it would
seem that the API documentation is wrong), or did I encounter some weird
bug? And does anyone have a solution?

Kind regards,
Jeroen Vaes



More information about the Xapian-discuss mailing list