[Xapian-tickets] [Xapian] #712: Geospatial API thoughts
Xapian
nobody at xapian.org
Tue Mar 22 03:22:03 GMT 2016
#712: Geospatial API thoughts
--------------------------------+------------------
Reporter: olly | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Library API | Version:
Severity: normal | Keywords:
Blocked By: | Blocking:
Operating System: All |
--------------------------------+------------------
I've been looking at making the geospatial API not force you to specify a
metric (the only user feedback I've so far heard about it was that it's
annoying to have to do so when there's a sane default), and it feels like
it provides unnecessary flexibility which close to nobody is actually
going to want, but in doing so it forces inefficiencies upon all users.
The `LatLongMetric` class is essentially a pluggable algorithm for
"distance between two coordinates".
We don't actually seem to document any requirements for it, though
presumably it ought to fulfil the requirements of a "metric" in the
mathematical sense
(https://en.wikipedia.org/wiki/Metric_%28mathematics%29):
* d(x,y) >= 0
* d(x,y) == 0 <=> x == y
* d(x,y) = d(y,x)
* d(x,z) <= d(x,y) + d(y,z)
If we documented these as requirements, we might be able to take advantage
them to optimise - e.g. when handling distance from a polygon.
The documentation only seems to suggest you might want to use a different
metric which calculates great circle distances more exactly. The provided
metric assumes that the planet is a sphere, and uses the haversine
function, which suffers from numeric errors for points close to opposite
each other - I struggle to believe that for purposes of ranking or
limiting searches that either of these is a problem in reality.
Providing a metric which calculates "distance by road" or similar seems a
more plausible reason an API user might to want to provide their own, but
the documentation doesn't even hint that this as something you could do.
You would need to take care that the metric wasn't too expensive to
calculate of course - running a full routing algorithm for every pair of
points being considered is unlikely to scale well.
The forced inefficiency issue I noted is that metric has to calculate
"pointwise_distance" in metres. At least for sorting and filtering, we
really just need a value which is strictly monotonic in distance (and for
filtering to reverse map the threshold into the same domain). For
haversine, we could sort by h, and avoid having to calculate an arcsin and
square root each time. Not exposing metrics would easily allow that, but
redefining the interface carefully could also allow it.
None of this feasible before 1.4.0 (except perhaps dropping metrics from
the public API entirely), but I wanted to write down my thoughts while
they were still fresh.
I think for 1.4.0 we default the metric if not specified, and either just
leave the metric classes marked as experimental, or improve the
documentation to actually say what the requirements are and give a better
example of a user metric.
--
Ticket URL: <https://trac.xapian.org/ticket/712>
Xapian <//xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list