[Xapian-devel] FASTER Search

Olly Betts olly at survex.com
Sun Jan 27 01:47:00 GMT 2013


On Sun, Jan 20, 2013 at 01:19:33PM +0800, ?????? wrote:
> On Sat, Jan 19, 2013 at 7:53 AM, Olly Betts <olly at survex.com> wrote:
> 
> > On Thu, Jan 17, 2013 at 01:50:25PM +0800, ?????? wrote:
> > > I am suffering for slow searching performance on Xapian.
> > >
> > > I am using Xapian for indexing about 150,000,000 documents.
> > > It was implemented in C++;
> >
> > Which version of Xapian are you using?
> >
> > What OS is this on?
> >
> > How big is the database on disk?
> >
> > How much RAM do you have?
> >
> 
> I am using xapian-core-1.2.12 on Debian;
> the database was about 50G;
> the computer has 48G ram;

OK, so you shouldn't be even slightly I/O bound.

> I have a test for time by searching 500 query, the avg time cost are
> real  1876ms
> user 1649ms
> sys   227ms

And indeed you aren't.

> below is the actual code:
> [...]
> std::vector<uint32_t> search(const string& query1, const string& query2,
> const string& query3, unsigned offset, unsigned pagesize, double k1, double
> k2, double k3, double b, double min_normlen) {

What are you passing for offset and pagesize in your tests?

>         for (Xapian::MSetIterator i = mset.begin(); i != mset.end(); ++i) {
>             const string& data = i.get_document().get_data();
>             uint32_t reid = atoi(data.c_str());
>             result.push_back(reid);
>         }

One option here is to simply set Xapian's document id to this "reid".
Depending on the order in which the reids are encountered during
indexing, this may be slower to index, but it will save time when
searching - probably a lot if you're asking for a lot of results.

You can get an idea of how much difference this would make by calling
i.get_docid() instead of the first two lines in this loop and timing
that.

Cheers,
    Olly



More information about the Xapian-devel mailing list