SQL-like JOINs on separate DBs?
Eric Wong
e at 80x24.org
Fri Mar 5 21:08:50 GMT 2021
Eric Wong <e at 80x24.org> wrote:
> Olly Betts <olly at survex.com> wrote:
> > I'm not clear exactly you want to join on (SHA of what or pathname of
> > what?), but if you can efficiently map the Xapian::docid in the main
> > database to a Xapian::docid or unique ID in the auxiliary database then
> > a custom PostingSource subclass would work here (and the auxiliary
> > database wouldn't even need to be a Xapian database, or indeed a
> > database at all.) The `kw:seen` subquery would be a PostingSource.
>
> Erm, the SHA would be a boolean term or stored in docdata of the
> underlying document (a git blob). $PATHNAME would be the
> normalized pathname of the Xapian DB (or some stable integer
> mapping).
>
> The actual set of giant, read-only Xapian DBs would be volatile
> and subject to constant change depending on which DBs a user is
> interested in. The docids within each of these giant, read-only
> Xapian DBs is stable, however.
What I'm thinking I could do is have each user use multiple
small read-write DBs each mapped to a corresponding large
read-only DB and use a custom PostingSource via Python.
I'm also stuck supporting 1.2.22 for some users on CentOS 7; and
it'd be easier for users on that platform to be able to use
Xapian via RPM Python 2.x bindings w/o having to get
Search::Xapian from CPAN.
> > > I'm using Perl Search::Xapian from Debian stable (buster).
> >
> > Unfortunately that doesn't wrap PostingSource.
>
> Oh well. Writing a small Python daemon using the SWIG bindings
> could be an option(*)...
More information about the Xapian-discuss
mailing list