[Xapian-tickets] [Xapian] #737: Fix/improve $filters

Xapian nobody at xapian.org
Tue May 7 03:08:28 BST 2024


#737: Fix/improve $filters
-------------------------+-------------------------------
 Reporter:  Olly Betts   |             Owner:  Olly Betts
     Type:  enhancement  |            Status:  closed
 Priority:  highest      |         Milestone:  1.5.0
Component:  Omega        |           Version:
 Severity:  normal       |        Resolution:  fixed
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+-------------------------------
Changes (by Olly Betts):

 * status:  assigned => closed
 * resolution:   => fixed

Comment:

 Finished off and committed to master as
 a709f04794725efd8d89d14d726c714ae0c7e7b9.  Not suitable for backporting to
 1.4.x.

 We now encode slot numbers in `$filters` output with a base-64 like
 encoding.  We need to handle the variable length somehow - currently each
 continuation byte is currently flagged by preceding it with a special byte
 (a space currently, size that encodes as a single byte (`+`) in a CGI
 parameter in a URL).  So e.g. 65 -> `1 1`, and this means slots 65 to 99
 actually encode less compactly than before (but 10 to 64 more compactly).
 We could rejig to avoid this but it's very rare in my experience to use
 such large slot numbers.

 Filter terms are now prefix-compressed.  Also instead of escaping `~` in
 the term and using `~` as a terminator we now store the length first (a
 bit like Pascal rather than C strings), using the base-64 like encoding to
 store the length (and the length of the prefix to reuse).  Storing the
 length doesn't affect the encoding length at all unless terms contain `~`
 or the length of the string to append to the reused portion is > 63 bytes
 long, but it's simpler to encode as we can just copy the term data rather
 than having to scan it for `~`.

 DEFAULTOP, DOCIDORDER, SORTREVERSE and SORTAFTER are now encoded together
 into a single character.

 It also occurred to me we could hash the encoded filters if they're longer
 than a certain length.  They'd then no longer guaranteed unique, but it
 would help avoid exceeding URL length limits.  However nobody has ever
 reported problems with hitting such limits, and the filter encoding we'll
 produce for the next release series will be more compact than currently,
 so I think let's worry about that if we ever get reports of it being an
 issue.
-- 
Ticket URL: <https://trac.xapian.org/ticket/737#comment:5>
Xapian <https://xapian.org/>
Xapian


More information about the Xapian-tickets mailing list