[Xapian-tickets] [Xapian] #737: Fix/improve $filters
Xapian
nobody at xapian.org
Tue May 7 03:08:28 BST 2024
#737: Fix/improve $filters
-------------------------+-------------------------------
Reporter: Olly Betts | Owner: Olly Betts
Type: enhancement | Status: closed
Priority: highest | Milestone: 1.5.0
Component: Omega | Version:
Severity: normal | Resolution: fixed
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+-------------------------------
Changes (by Olly Betts):
* status: assigned => closed
* resolution: => fixed
Comment:
Finished off and committed to master as
a709f04794725efd8d89d14d726c714ae0c7e7b9. Not suitable for backporting to
1.4.x.
We now encode slot numbers in `$filters` output with a base-64 like
encoding. We need to handle the variable length somehow - currently each
continuation byte is currently flagged by preceding it with a special byte
(a space currently, size that encodes as a single byte (`+`) in a CGI
parameter in a URL). So e.g. 65 -> `1 1`, and this means slots 65 to 99
actually encode less compactly than before (but 10 to 64 more compactly).
We could rejig to avoid this but it's very rare in my experience to use
such large slot numbers.
Filter terms are now prefix-compressed. Also instead of escaping `~` in
the term and using `~` as a terminator we now store the length first (a
bit like Pascal rather than C strings), using the base-64 like encoding to
store the length (and the length of the prefix to reuse). Storing the
length doesn't affect the encoding length at all unless terms contain `~`
or the length of the string to append to the reused portion is > 63 bytes
long, but it's simpler to encode as we can just copy the term data rather
than having to scan it for `~`.
DEFAULTOP, DOCIDORDER, SORTREVERSE and SORTAFTER are now encoded together
into a single character.
It also occurred to me we could hash the encoded filters if they're longer
than a certain length. They'd then no longer guaranteed unique, but it
would help avoid exceeding URL length limits. However nobody has ever
reported problems with hitting such limits, and the filter encoding we'll
produce for the next release series will be more compact than currently,
so I think let's worry about that if we ever get reports of it being an
issue.
--
Ticket URL: <https://trac.xapian.org/ticket/737#comment:5>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list