SQL-like JOINs on separate DBs?

Eric Wong e at 80x24.org
Fri Mar 5 21:08:50 GMT 2021


Eric Wong <e at 80x24.org> wrote:
> Olly Betts <olly at survex.com> wrote:
> > I'm not clear exactly you want to join on (SHA of what or pathname of
> > what?), but if you can efficiently map the Xapian::docid in the main
> > database to a Xapian::docid or unique ID in the auxiliary database then
> > a custom PostingSource subclass would work here (and the auxiliary
> > database wouldn't even need to be a Xapian database, or indeed a
> > database at all.)  The `kw:seen` subquery would be a PostingSource.
> 
> Erm, the SHA would be a boolean term or stored in docdata of the
> underlying document (a git blob).  $PATHNAME would be the
> normalized pathname of the Xapian DB (or some stable integer
> mapping).
> 
> The actual set of giant, read-only Xapian DBs would be volatile
> and subject to constant change depending on which DBs a user is
> interested in.  The docids within each of these giant, read-only
> Xapian DBs is stable, however.

What I'm thinking I could do is have each user use multiple
small read-write DBs each mapped to a corresponding large
read-only DB and use a custom PostingSource via Python.

I'm also stuck supporting 1.2.22 for some users on CentOS 7; and
it'd be easier for users on that platform to be able to use
Xapian via RPM Python 2.x bindings w/o having to get
Search::Xapian from CPAN.

> > > I'm using Perl Search::Xapian from Debian stable (buster).
> > 
> > Unfortunately that doesn't wrap PostingSource.
> 
> Oh well.  Writing a small Python daemon using the SWIG bindings
> could be an option(*)...



More information about the Xapian-discuss mailing list