Is there a large variance in xapian searching?

morefreeze morefreeze at gmail.com
Tue Jul 3 08:15:28 BST 2018


Awesome, thanks!
I use xapian 1.4.5 and congratulate 1.4.6 has been released. I am reading
these link you gave me. I will issue another thread if I get stucked.

On Tue, Jul 3, 2018 at 2:21 PM Olly Betts <olly at survex.com> wrote:

> On Mon, Jul 02, 2018 at 06:08:40PM +0800, morefreeze wrote:
> > I found every first time(like after booting computer) or
> > sometime(occasional) to query(use QueryParse) this databases will cost
> > significant seconds (like 5 seconds), although it cost 0.8 seconds on
> > average. What is the reason of this?
>
> If you've just rebooted, none of the database will be cached, so
> everything has to be fetched from disk and that takes more time.
>
> The second query will be faster even if it's for entirely different
> terms, because at least the root blocks will be read from cache.
> And pretty quickly the cache ends up with all the frequently read
> blocks.
>
> This can also happen without a reboot if another process reads a lot
> of data which ends up in cache instead of the database blocks.  If
> the machine has cronjobs making backups, update the db used by the
> "locate" tool, or doing other things which read a lot of files, you
> might want to consider carefully when they run, or run them under
> something which minimises cache effects such as "nocache".
>
> > If I want to shorten this query time what should I do or try? BTW, I
> > think splitting more databases and query them parallelly is not a good
> > idea, unless xapian ensure each query is less than a expected
> > time(Actually this 13M database is 'small', :P).
>
> I'd think searching more databases would if anything make this "cold
> cache" effect worse.
>
> You don't say what version you're using, but make sure it's a recent
> Xapian 1.4.x and that you're using the glass backend.  If you're still
> using 1.2.x, or 1.4.x with chert databases then switching to 1.4.x+glass
> is likely to help.
>
> You can warm the cache usefully just by running a few queries (if
> you make them for commonly searched terms that will be more effective).
> So if you have a cluster of search machines and want to add a new
> member to it, you can automate running a few "warm up" queries after
> spinning up the new instance but before actually adding it to the
> cluster.
>
> 1.4.x will issue prefetch hints if posix_fadvise() is available, which
> helps when the cache is cold.  These are done automatically for
> postlists, but you can call MSet::fetch() to issue prefetch hints for
> fetching document data.  This ticket is about the prefetching changes:
>
> https://trac.xapian.org/ticket/671
>
> If you want to profile what database blocks are being read, then the
> strace-analyse script may be useful:
>
>
> https://trac.xapian.org/browser/git/xapian-maintainer-tools/profiling/strace-analyse
>
> See the comments in the script for how to use it.
>
> Cheers,
>     Olly
>


-- 
One of my most productive days was throwing away 1000 lines of code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20180703/b7c9d140/attachment.html>


More information about the Xapian-devel mailing list