[Xapian-tickets] [Xapian] #737: Fix/improve $filters

Xapian nobody at xapian.org
Fri Oct 7 02:25:14 BST 2016


#737: Fix/improve $filters
--------------------------------+-------------------
        Reporter:  olly         |      Owner:  olly
            Type:  enhancement  |     Status:  new
        Priority:  normal       |  Milestone:  1.5.0
       Component:  Omega        |    Version:
        Severity:  normal       |   Keywords:
      Blocked By:               |   Blocking:
Operating System:  All          |
--------------------------------+-------------------
 The current encoding of $filters has at least one bug (which was also
 present in the older encoding used in 1.2.x):

  * `DOCIDORDER=A` is the default, but produces an `X` in
 `$filters`/`DOCIDORDER=X` is non-default but produces nothing in
 `$filters`.  Currently however, `A` and `X` are identical as `DONT_CARE`
 currently actually always results in `ASCENDING` order, so this doesn't
 seem worth changing anything for.  But if/when we change the encoding, we
 should address this.

 And it could be more compact:

  * Every `N` term is prefixed by `!`, but only the first needs to be.
  * Every encoded string has at least `~~` after the character for
 `DEFAULTOP`, which isn't necessary.
  * The `DEFAULTOP` character could be omitted when using the default
 `DEFAULTOP`.
  * We could combine some/all of `DEFAULTOP`, `DOCIDORDER` and the existing
 `SORTREVERSE`/`SORTAFTER` characters - there are currently 2, 3 and 2*2
 states, though more `DEFAULTOP` values are possible, and about
 10+26*2+19=81 characters which don't need URL encoding, so we could
 support up to 6 `DEFAULTOP` values and encode all of these into one
 character which shouldn't need URL encoding.
  * We could encode value slot numbers using something like base64 and save
 bytes when slots > 9 are used (or perhaps encode all the slot numbers
 together such that they'd usually all fit in one byte).
  * Lists of `B` and `N` are sorted, so could easily be prefix-compressed -
 reducing the size when there are a lot of either, which is a case where
 keeping the size down matters most.

 The compactness matters as the length of a URL is limited, and using `GET`
 is common for search systems.  A longer URL can also look uglier when
 pasted, etc.

--
Ticket URL: <https://trac.xapian.org/ticket/737>
Xapian <//xapian.org/>
Xapian



More information about the Xapian-tickets mailing list