[Xapian-tickets] [Xapian] #292: incorrect translation of non-english HTMLS when charset entry is after title in head.
Xapian
nobody at xapian.org
Mon Sep 1 23:50:55 BST 2008
#292: incorrect translation of non-english HTMLS when charset entry is after
title in head.
--------------------+-------------------------------------------------------
Reporter: rssh | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Omega | Version:
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
--------------------+-------------------------------------------------------
Old description:
> When meta "http-eqiv" (where charset is set) is situated after "title"
> element in htmlk document, than title entry in index is incorrect.
>
> patch to fix is attached.
New description:
When meta "http-equiv" (where charset is set) is situated after "title"
element in html document, than title entry in index is incorrect.
patch to fix is attached.
--
Comment(by olly):
The patch is incorrect. The default character set for HTML *is*
ISO-8859-1, at least in this context (since we're trying to work with
documents from a webserver's document tree. The HTTP/1.1 spec (RFC 2616)
says:
When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP.
Your comment here misses the point:
(first 256 position of UTF-8 are the same, as in "ISO-8859-1")
While the character values are the same, the byte-level encodings are only
the same for the first 128 positions, and it's the byte-level encoding
which matters here.
Please attach an example document which demonstrates the problem to save
me having to try to create one from your description.
Also, what version are you using?
--
Ticket URL: <http://trac.xapian.org/ticket/292#comment:1>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list