[Xapian-devel] Initial patch for ExternalPostList

Olly Betts olly at survex.com
Sat Jun 3 03:00:37 BST 2006


On Fri, Jun 02, 2006 at 06:24:12PM -0600, Rusty Conover wrote:
> You can use it by first implementing an ExternalPostingSource, then
> creating a new Query object passing a reference an instance of your
> implementation to the constructor, see query.h. The
> ExternalPostingSource implementation is reference counted, so when
> its no longer needed it can clean itself up.

If you're going to reference count it, it would be better to use the
existing reference counting machinery (include/xapian/base.h) unless
there's a good reason not to.

> It works well for filtering results but it has a problem that once it
> filters all of the results are returned with 100% relevancy. I'm not
> quite sure how to fix that but I think its because I just used the
> default weight implementations in matcher/externalsourcepostlist.h. 
> Any hints or patches to fix that so it works correctly would be
> greatly appreciated.

I think the weights look OK, but get_termfreq_*() shouldn't return
0 but rather bounds/estimates of the number of entries that the
ExternalPostingSource would return if you called next() until
at_end() returned true.

I suspect returning 0 might confuse the matcher.

> Also the use of an ExternalPostList query breaks the network method
> of transmitting queries, I haven't written code to serialize the post
> list or even decide if that's worth doing.

I think this is probably a concept which doesn't fit well with the
remote backend.  You could perhaps use a registering scheme similar
to that provided for user-defined weighting schemes, but a weighting
scheme is just an algorithm plus some parameters, whereas this class
will typically be talking to some other system like an SQL database.

> I realize there has been some stylistic license taken with the
> formatting of this patch. I didn't pay much attention to following
> existing indentation and bracket rules and what not.  If you're up
> for including this into the release branch, I'd be glad to take the
> time and clean it up even further.

I think it's probably suitable for inclusion with a bit more work.
Similar ideas have been discussed before, and I can see scope for
lots of neat things this enables.

I've had a quick read through - here are some comments:

* I don't think you need to call start_construction(), etc and
  have the set_external_source method - just have a
  Query::Internal::Internal(ExternalPostingSource *) ctor
  for the internal class.

* I think OP_EXTERNAL_POST_LIST should be internal like OP_LEAF.

* There's no point trying to encode OP_EXTERNAL_POST_LIST in
  the query serialisation if this isn't supported by the remote backend
  (especially as I've just rewritten query serialisation but not checked
  it in yet!)  Better to just throw UnimplementedError if we get there.

* ExternalPostingSource should derive from RefCntBase and then you can
  use the same reference counting mechanism the other API classes do.

* Don't renumber OP_ELITE_SET - the java bindings currently have these
  values hard-wired (ick) so they'll break!  That's why it's explicitly
  numbered currently (the enum with value 9 is no longer used).

* In this case, skip_to and next should always return NULL.  You only
  need to return something if you want to replace the current postlist
  with a different one (e.g. an OR turning into an AND because of
  optimisations based on the minimum weight required) and we don't want
  to do that here.

Phew!  That might seem like a lot, but they're all pretty minor really.
Generally this is looking pretty plausible.

Cheers,
    Olly
-------------- next part --------------
A non-text attachment was scrubbed...
Name: external-post-list-1.diff
Type: text/patch
Size: 11870 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-devel/attachments/20060603/09b703f4/external-post-list-1-0001.bin


More information about the Xapian-devel mailing list