[Xapian-discuss] Remote database search issues

Ron Kass ron at pidgintech.com
Sat Oct 27 18:03:34 BST 2007


Hi all.
First, a note about remote database connection over Perl. We actually 
found an easy way to work around the unwrapped Renote::open issue.. We 
use a stub file.
You might say that open_stub in also not wrapped.. which is true... 
HOWEVER... looking at the code, we realized that Database::open() opts 
to using stub_open if the argument is a string pointing to a stub file 
rather than a database directory... So instead of 
Database::open('/data/ftsdirectory') you can do 
Database::open('/stubfile.dat')
Pretty handy trick.. not as nice as proper remote database open (since 
with that you can dynamically control via code which servers to connect 
to) but still, it works.

So we then tested remote search... We faced several problems...

1) Only the xapian-tpcsrv worked. We couldn't figure out how to use 
xapian-progsrv. The problem was the stub file format.
This works:
    remote 10.0.0.27:33333
But these don't work
    remote ssh ftsuser at 10.0.0.27 xapian-progsrv /data/fts/Database2/
or
    remote ssh 10.0.0.27 xapian-progsrv /data/fts/Database2/

The error we get is
    Error creating DB with stub file: Exception: Bad line 1 in stub 
database file `/fts/stub.dat' at ...
Can anyone shed a light on this one? ssh is configured properly. ftsuser 
is allowed to ssh without a prompt (using proper key files). database is 
in the right location.. the error seems to be from the parsing of the line.
What are we doing wrong? whats the right format for remote over 
xapian-progsrv?

2) We tried remote search over tcpsrv... which we can not really use 
besides for testing, until it supports parallel searches, which is 
something xapian-progsrv does support as far as we understand.
search speed was bad. A search for a single word (like gift) takes well 
over a second for the first search. something that fast when running 
locally. Even when done on the same machine (with localhost) its not 
that fast.
Furthermore, fetching the documents also takes a long time and even 
worse than that, fetching the matching words. Even on localhost.
TCP overhead shouldn't be that bad, should it? Maybe its tcpsrv 
performance in general?

It probably doesn't help that search speed is slow in general in our 
searches (this issue is being discussed in another discussion in this 
mailing list), but nonetheless, its much slower than the regular slow 
search.


Any tips, ideas, thoughts on these two issues? Did anyone manage using 
multiple remote databases effectively?


Best regards,
Ron



More information about the Xapian-discuss mailing list