[Xapian-tickets] [Xapian] #550: Omega script enhancement: $prettyurl
Xapian
nobody at xapian.org
Fri Dec 9 11:25:57 GMT 2011
#550: Omega script enhancement: $prettyurl
-------------------------+--------------------------------------------------
Reporter: catkin | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone: 1.3.0
Component: Omega | Version:
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
-------------------------+--------------------------------------------------
Comment(by olly):
As I alluded to in the text you quote, decoding bytes >= 0x80 is
problematic as we don't know for sure what the filename encoding is, and
inserting random top-bit-set byte sequences into an HTML page labelled at
UTF-8 isn't a great plan. Modern Linux distros seem to have converged on
UTF-8, at least by default, but you can use other encodings, and other
platforms may be different. Also, if you copy a file from a system with a
different encoding, it may not match what is used locally, so checking
LC_ALL, etc doesn't really help.
I guess we could see if any sequences of characters >= 0x80 are valid
UTF-8 and decode them if so. Generating broken UTF-8 output is really bad
- risking generating the wrong characters isn't so bad when the
alternative is showing unreadable hex codes.
So I think an initial version which just deals with bytes < 0x80 would be
worthwhile as it would at least address the ugly escaping to some extent
(and fully for English filenames).
Incidentally, referring to an email by giving the digest number isn't very
useful - I don't subscribe to the digest version, and I don't know of any
way to look at a previous digest in mailman, or find out what messages
were in it. If the digest contains it, the message id of the original
email is much more helpful.
--
Ticket URL: <http://trac.xapian.org/ticket/550#comment:2>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list