[Xapian-tickets] [Xapian] #712: Geospatial API thoughts

Xapian nobody at xapian.org
Tue Mar 22 03:22:03 GMT 2016


#712: Geospatial API thoughts
--------------------------------+------------------
        Reporter:  olly         |      Owner:  olly
            Type:  defect       |     Status:  new
        Priority:  normal       |  Milestone:
       Component:  Library API  |    Version:
        Severity:  normal       |   Keywords:
      Blocked By:               |   Blocking:
Operating System:  All          |
--------------------------------+------------------
 I've been looking at making the geospatial API not force you to specify a
 metric (the only user feedback I've so far heard about it was that it's
 annoying to have to do so when there's a sane default), and it feels like
 it provides unnecessary flexibility which close to nobody is actually
 going to want, but in doing so it forces inefficiencies upon all users.

 The `LatLongMetric` class is essentially a pluggable algorithm for
 "distance between two coordinates".

 We don't actually seem to document any requirements for it, though
 presumably it ought to fulfil the requirements of a "metric" in the
 mathematical sense
 (https://en.wikipedia.org/wiki/Metric_%28mathematics%29):

  * d(x,y) >= 0
  * d(x,y) == 0 <=> x == y
  * d(x,y) = d(y,x)
  * d(x,z) <= d(x,y) + d(y,z)

 If we documented these as requirements, we might be able to take advantage
 them to optimise - e.g. when handling distance from a polygon.

 The documentation only seems to suggest you might want to use a different
 metric which calculates great circle distances more exactly.  The provided
 metric assumes that the planet is a sphere, and uses the haversine
 function, which suffers from numeric errors for points close to opposite
 each other - I struggle to believe that for purposes of ranking or
 limiting searches that either of these is a problem in reality.

 Providing a metric which calculates "distance by road" or similar seems a
 more plausible reason an API user might to want to provide their own, but
 the documentation doesn't even hint that this as something you could do.
 You would need to take care that the metric wasn't too expensive to
 calculate of course - running a full routing algorithm for every pair of
 points being considered is unlikely to scale well.

 The forced inefficiency issue I noted is that metric has to calculate
 "pointwise_distance" in metres.  At least for sorting and filtering, we
 really just need a value which is strictly monotonic in distance (and for
 filtering to reverse map the threshold into the same domain).  For
 haversine, we could sort by h, and avoid having to calculate an arcsin and
 square root each time.  Not exposing metrics would easily allow that, but
 redefining the interface carefully could also allow it.

 None of this feasible before 1.4.0 (except perhaps dropping metrics from
 the public API entirely), but I wanted to write down my thoughts while
 they were still fresh.

 I think for 1.4.0 we default the metric if not specified, and either just
 leave the metric classes marked as experimental, or improve the
 documentation to actually say what the requirements are and give a better
 example of a user metric.

--
Ticket URL: <https://trac.xapian.org/ticket/712>
Xapian <//xapian.org/>
Xapian



More information about the Xapian-tickets mailing list