Xapian Benchmark results

Mon Dec 3 04:50:18 GMT 2018

On Fri, Nov 30, 2018 at 02:28:19PM -0600, Krishna Bharadwaj wrote:
> I am currently trying to benchmark a multithreaded xapian implementation on
> a chameleon baremetal instance written in C++. My workload is a 3 Gig
> wikipedia xml dump consisting of ~286 file of different sizes. My results
> are showing me that indexing on xapian is an order of magnitude faster than
> my lucene and lucene plusplus implementations. This is a result that I did
> not expect. Just want to confirm with you guys if my
> implementation(attached below) is correct. I notice that I am getting the
> search results correctly but I am just not able to get over the fact that
> xapian is performing so much faster than my other implementations. There
> have been no optimizations in my code. Awaiting your response.

The indexer code looks plausible to me.

You can poke at the built databases with xapian-delve to check what
terms index a given document, etc.

>        Xapian::Document doc;
> 		doc.set_data(line_string);

You typically wouldn't just store the entire document in the Xapian
document's data, though that might make sense for some applications.
But storing less there would only make things faster.

Cheers,
    Olly