[Xapian-devel] Initial patch for ExternalPostList
Olly Betts
olly at survex.com
Sat Jun 3 03:00:37 BST 2006
On Fri, Jun 02, 2006 at 06:24:12PM -0600, Rusty Conover wrote:
> You can use it by first implementing an ExternalPostingSource, then
> creating a new Query object passing a reference an instance of your
> implementation to the constructor, see query.h. The
> ExternalPostingSource implementation is reference counted, so when
> its no longer needed it can clean itself up.
If you're going to reference count it, it would be better to use the
existing reference counting machinery (include/xapian/base.h) unless
there's a good reason not to.
> It works well for filtering results but it has a problem that once it
> filters all of the results are returned with 100% relevancy. I'm not
> quite sure how to fix that but I think its because I just used the
> default weight implementations in matcher/externalsourcepostlist.h.
> Any hints or patches to fix that so it works correctly would be
> greatly appreciated.
I think the weights look OK, but get_termfreq_*() shouldn't return
0 but rather bounds/estimates of the number of entries that the
ExternalPostingSource would return if you called next() until
at_end() returned true.
I suspect returning 0 might confuse the matcher.
> Also the use of an ExternalPostList query breaks the network method
> of transmitting queries, I haven't written code to serialize the post
> list or even decide if that's worth doing.
I think this is probably a concept which doesn't fit well with the
remote backend. You could perhaps use a registering scheme similar
to that provided for user-defined weighting schemes, but a weighting
scheme is just an algorithm plus some parameters, whereas this class
will typically be talking to some other system like an SQL database.
> I realize there has been some stylistic license taken with the
> formatting of this patch. I didn't pay much attention to following
> existing indentation and bracket rules and what not. If you're up
> for including this into the release branch, I'd be glad to take the
> time and clean it up even further.
I think it's probably suitable for inclusion with a bit more work.
Similar ideas have been discussed before, and I can see scope for
lots of neat things this enables.
I've had a quick read through - here are some comments:
* I don't think you need to call start_construction(), etc and
have the set_external_source method - just have a
Query::Internal::Internal(ExternalPostingSource *) ctor
for the internal class.
* I think OP_EXTERNAL_POST_LIST should be internal like OP_LEAF.
* There's no point trying to encode OP_EXTERNAL_POST_LIST in
the query serialisation if this isn't supported by the remote backend
(especially as I've just rewritten query serialisation but not checked
it in yet!) Better to just throw UnimplementedError if we get there.
* ExternalPostingSource should derive from RefCntBase and then you can
use the same reference counting mechanism the other API classes do.
* Don't renumber OP_ELITE_SET - the java bindings currently have these
values hard-wired (ick) so they'll break! That's why it's explicitly
numbered currently (the enum with value 9 is no longer used).
* In this case, skip_to and next should always return NULL. You only
need to return something if you want to replace the current postlist
with a different one (e.g. an OR turning into an AND because of
optimisations based on the minimum weight required) and we don't want
to do that here.
Phew! That might seem like a lot, but they're all pretty minor really.
Generally this is looking pretty plausible.
Cheers,
Olly
-------------- next part --------------
A non-text attachment was scrubbed...
Name: external-post-list-1.diff
Type: text/patch
Size: 11870 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-devel/attachments/20060603/09b703f4/external-post-list-1-0001.bin
More information about the Xapian-devel
mailing list