How to make xapian run in hadoop
Olly Betts
olly at survex.com
Fri Nov 22 05:45:23 GMT 2019
On Thu, Nov 21, 2019 at 10:20:19AM +0800, 程苏珺 wrote:
> We use xapian as the backend of our system. Now the data need be
> indexed ever-increasing, and the local mode is hard to maintain, so we
> plan to move the index builder to hadoop. We try to make xapian can be
> run in hadoop, and now met a problem that there are many seek
> operations when xapian writes the index files, but the method seek()
> in hadoop c api only support read, and we blocked by that now
Updating a glass backend database pretty fundamentally requires a
way to "write block N". We don't actually require the ability to
seek arbitrarily, but if hadoop writes are limited to appending to
a file your approach is just not going to work for updating.
It might be possible to buffer up everything in RAM and then write out a
glass database in one go with such a limitation, but if you're having
scaling problems then forcing a situation where the whole database needs
to be created in RAM before it can be written is not going to help.
> It looks a big work to rewrite the xapian database backend to
> adapter the hadoop c api. Could you please give us some suggestions?
The in-development backend (honey) would probably be easier to get
to work here once finished, but currently it doesn't support
writing directly so that's no help if you want a solution now.
Perhaps you could elaborate on the problem you're actually trying
to solve here.
What does "the local mode is hard to maintain" actually mean?
Cheers,
Olly
More information about the Xapian-discuss
mailing list