[Xapian-tickets] [Xapian] #550: Omega script enhancement: $prettyurl

Xapian nobody at xapian.org
Thu Jun 12 13:06:22 BST 2014


#550: Omega script enhancement: $prettyurl
-------------------------+-----------------------------
 Reporter:  catkin       |             Owner:  olly
     Type:  enhancement  |            Status:  assigned
 Priority:  normal       |         Milestone:  1.3.3
Component:  Omega        |           Version:
 Severity:  normal       |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+-----------------------------
\
\
\
\
\
\

Comment (by olly):

 So in older versions, we didn't really do a proper job with URL encoding.
 That got fixed by doing what the latest RFC on the subject said, which is
 great for the links in the result page, but people also sometimes want to
 show the URL in the text, and the by-the-book encoding makes URLs much
 uglier than they were before.

 Such URLs really ought to work if cut and pasted, but readability is also
 important - if a particular URL doesn't work in some ancient or obscure
 browser, that's probably acceptable.

 So to address this, we added {{{$prettyurl}}} to take a URL and undo the
 percent-encoding where we're confident it isn't needed in practice.  The
 URL might be full or relative, and could theoretically use any scheme,
 though in practice it's most likely to be {{{http:}}} or {{{https:}}}, so
 handling those well is particularly important.

 So we do have to deal with an authority section, but we only need to worry
 about decoding, not encoding.  None of {{{[]@}}} are valid in hostnames
 IIRC, but they could be seen in a username or password.  Having those in
 search result links seems unlikely, but perhaps we should do some basic
 parsing of the URL and limit what we decode here.

 I'm aware {{{http:bad.html}}} is valid - it just doesn't mean the same as
 {{{http%3Abad.html}}} (the "bad" is that it's bad to undo the percent
 encoding there).  And {{{http:http:bad.html}}} was a test to see if an
 unencoded {{{:}}} works if there is an explicit scheme (which is seems
 to).

 Probably the next step should actually be to try to handle top-bit-set
 characters.  For these, I think we just need to make sure that they're
 valid for the character set the page is in, though I've not done any tests
 yet.

 Incidentally, I also tested with the browser on my android phone, and
 results are inline with the other mainstream browsers I tried.  I'm not
 sure what this browser is called (the "about" dialog just shows the
 useragent string, which seems to include the name of just about every web
 browser I can think of).
\
\
\

--
Ticket URL: <http://trac.xapian.org/ticket/550#comment:10>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list