[Xapian-tickets] [Xapian] #550: Omega script enhancement: $prettyurl
Xapian
nobody at xapian.org
Thu Jun 12 13:06:22 BST 2014
#550: Omega script enhancement: $prettyurl
-------------------------+-----------------------------
Reporter: catkin | Owner: olly
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.3.3
Component: Omega | Version:
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+-----------------------------
\
\
\
\
\
\
Comment (by olly):
So in older versions, we didn't really do a proper job with URL encoding.
That got fixed by doing what the latest RFC on the subject said, which is
great for the links in the result page, but people also sometimes want to
show the URL in the text, and the by-the-book encoding makes URLs much
uglier than they were before.
Such URLs really ought to work if cut and pasted, but readability is also
important - if a particular URL doesn't work in some ancient or obscure
browser, that's probably acceptable.
So to address this, we added {{{$prettyurl}}} to take a URL and undo the
percent-encoding where we're confident it isn't needed in practice. The
URL might be full or relative, and could theoretically use any scheme,
though in practice it's most likely to be {{{http:}}} or {{{https:}}}, so
handling those well is particularly important.
So we do have to deal with an authority section, but we only need to worry
about decoding, not encoding. None of {{{[]@}}} are valid in hostnames
IIRC, but they could be seen in a username or password. Having those in
search result links seems unlikely, but perhaps we should do some basic
parsing of the URL and limit what we decode here.
I'm aware {{{http:bad.html}}} is valid - it just doesn't mean the same as
{{{http%3Abad.html}}} (the "bad" is that it's bad to undo the percent
encoding there). And {{{http:http:bad.html}}} was a test to see if an
unencoded {{{:}}} works if there is an explicit scheme (which is seems
to).
Probably the next step should actually be to try to handle top-bit-set
characters. For these, I think we just need to make sure that they're
valid for the character set the page is in, though I've not done any tests
yet.
Incidentally, I also tested with the browser on my android phone, and
results are inline with the other mainstream browsers I tried. I'm not
sure what this browser is called (the "about" dialog just shows the
useragent string, which seems to include the name of just about every web
browser I can think of).
\
\
\
--
Ticket URL: <http://trac.xapian.org/ticket/550#comment:10>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list