[Xapian-discuss] xapian performance

Fernando Nemec fernando.nemec at folha.com.br
Thu Nov 23 14:25:27 GMT 2006


Olly,

> And how are you timing?
> If this is "wall-clock" time from the "time" utility/built-in, what are
> the user and system times?

I'm using time time utility. As the user and system times are so low I
remove them from the message I sent you. In the end of message I'll
put the queries and all the times.

> It's interesting that the first case is sped up (by 8% which is
> little high to be noise) - the patch shouldn't change non-phrase
> queries at all.

In this particular case time difference is sometimes up to 12%. I
didn't worry about that because search times < 0.5 are fine for my
application.

> Is this SVN HEAD with and without this patch?
> http://www.oligarchy.co.uk/xapian/patches/xapian-experimental-phrase-optimisation-v2.patch

Both. The first block without the patch and the seconds block with the
patch. In the end of this message I'll put just the report made svn
head _with_ the patch above.

> I think this must mean that we need to read so many disk blocks for
> this query that not many end up cached.  I think you said you had 1GB
> of RAM, so there might not be all that much left for caching.

Yes, that's correct.

> What does the "free" command report?

That's the debug info for each query. It was made with svn head and
with xapian-experimental-phrase-optimisation-v2 patch. For each case I
add the free command report.

== CASE 1
<!--Xapian::Query(lula)-->
1 blocks read from /local/xapian/newdb/record.
4369 blocks read from /local/xapian/newdb/value.
3 blocks read from /local/xapian/newdb/termlist.
1 blocks read from /local/xapian/newdb/position.
104 blocks read from /local/xapian/newdb/postlist.

real    0m0.429s
user    0m0.396s
sys     0m0.036s

             total       used       free     shared    buffers     cached
Mem:       1034764    1019508      15256          0       3556     980372
-/+ buffers/cache:      35580     999184
Swap:      2097144      13308    2083836

== CASE 2
<!--Xapian::Query((presidente PHRASE 2 lula))-->
1 blocks read from /local/xapian/newdb/record.
3023 blocks read from /local/xapian/newdb/value.
3 blocks read from /local/xapian/newdb/termlist.
153036 blocks read from /local/xapian/newdb/position.
380 blocks read from /local/xapian/newdb/postlist.

real    1m33.191s
user    0m3.300s
sys     0m3.624s

             total       used       free     shared    buffers     cached
Mem:       1034764    1021384      13380          0       3492     982248
-/+ buffers/cache:      35644     999120
Swap:      2097144      13308    2083836


CASE 3
<!--Xapian::Query((governo PHRASE 6 do PHRASE 6 estado PHRASE 6 de PHRASE 6 sao PHRASE 6 paulo))-->
1 blocks read from /local/xapian/newdb/record.
1712 blocks read from /local/xapian/newdb/value.
3 blocks read from /local/xapian/newdb/termlist.
58136 blocks read from /local/xapian/newdb/position.
4141 blocks read from /local/xapian/newdb/postlist.

real    1m7.275s
user    0m1.556s
sys     0m2.484s

             total       used       free     shared    buffers     cached
Mem:       1034764    1020136      14628          0       4336     980640
-/+ buffers/cache:      35160     999604
Swap:      2097144      13308    2083836


CASE 4
<!--Xapian::Query((presidente PHRASE 2 luiz))-->
1 blocks read from /local/xapian/newdb/record.
3628 blocks read from /local/xapian/newdb/value.
3 blocks read from /local/xapian/newdb/termlist.
143663 blocks read from /local/xapian/newdb/position.
407 blocks read from /local/xapian/newdb/postlist.

real    1m16.068s
user    0m2.820s
sys     0m3.580s
             total       used       free     shared    buffers     cached
Mem:       1034764    1019752      15012          0       4016     980608
-/+ buffers/cache:      35128     999636
Swap:      2097144      13308    2083836


Thanks again for your help, Olly,

Nemec


Wednesday, November 22, 2006, 9:31:35 PM, you wrote:

> On Wed, Nov 22, 2006 at 06:55:21PM -0200, Fernando Nemec wrote:
>> Do you think its better to have a large set of queries or this will do
>> fine?

> The effects will depend on the queries, but Arjen has already tested a
> larger set so I was mostly hoping you could confirm there was no
> regression for the two term case.

>> This was made *without* experimental phrase optimization patch:
>> 
>> <!--Xapian::Query(lula)-->
>> 0m0.412s
>> <!--Xapian::Query((presidente PHRASE 2 lula))-->
>> 1m5.062s
>> <!--Xapian::Query((governo PHRASE 6 do PHRASE 6 estado PHRASE 6 de PHRASE 6 sao PHRASE 6 paulo))-->
>> 1m14.193s
>> 
>> That was made *with* phrase optimization patch:
>> 
>> <!--Xapian::Query(lula)-->
>> 0m0.379s
>> <!--Xapian::Query((presidente PHRASE 2 lula))-->
>> 0m58.514s
>> <!--Xapian::Query((governo PHRASE 6 do PHRASE 6 estado PHRASE 6 de PHRASE 6 sao PHRASE 6 paulo))-->
>> 1m2.503s

> It's interesting that the first case is sped up (by 8% which is little
> high to be noise) - the patch shouldn't change non-phrase queries at
> all.  Is this SVN HEAD with and without this patch?

> http://www.oligarchy.co.uk/xapian/patches/xapian-experimental-phrase-optimisation-v2.patch

> Are you timing Omega?  If so, did you try removing $topterms from your
> query template?

> And how are you timing?

> If this is "wall-clock" time from the "time" utility/built-in, what are
> the user and system times?

>> I don't know if this is relevant but may be it is. On this query
>> 
>> <!--Xapian::Query((presidente PHRASE 2 lula))-->
>> 
>> cache seems to do not affect this query at all. Even if I search the
>> exact same query seconds later the search time is high and almost the
>> same.

> I think this must mean that we need to read so many disk blocks for
> this query that not many end up cached.  I think you said you had 1GB
> of RAM, so there might not be all that much left for caching.

> What
> does the "free" command report?

>> If there's anything else I can do to help to fix this issue, please
>> let me know.

> It would be interesting to try measuring just how many blocks we
> actually read - this will be a repeatable measure, whereas timings
> from cold disk cache are much harder to exactly repeat.  Try applying
> this patch:

> http://www.oligarchy.co.uk/xapian/patches/flint-count-read-blocks.patch

> This reports the number of blocks read from each table of each flint
> database to stderr (the report happens whenever a database is closed).

> Cheers,
>     Olly


--
[]s
Fernando Nemec
fernando.nemec at folha.com.br
http://www.folha.com.br/





More information about the Xapian-discuss mailing list