Skip to content

Commit 6f9a41f

Browse files
hsivonenannevk
authored andcommitted
Describe the relationship between encodings and Unicode encoding schemes
1 parent 19af047 commit 6f9a41f

File tree

1 file changed

+17
-3
lines changed

1 file changed

+17
-3
lines changed

encoding.bs

+17-3
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,16 @@ a <a>byte</a> sequence (and vice versa). Each <a for=/>encoding</a> has a
135135
<dfn id=name export for=encoding>name</dfn>, and one or more
136136
<dfn id=label export for=encoding lt=label>labels</dfn>.
137137

138+
<p class="note no-backref">This specification defines three <a>encodings</a> with the same names as
139+
<i>encoding schemes</i> defined in the Unicode standard: <a>UTF-8</a>, <a>UTF-16LE</a>, and
140+
<a>UTF-16BE</a>. The <a>encodings</a> differ from the <i>encoding schemes</i> by byte order mark
141+
(also known as BOM) handling not being part of the <a>encodings</a> themselves and instead being
142+
part of wrapper algorithms in this specification, whereas byte order mark handling is part of the
143+
definition of the <i>encoding schemes</i> in the Unicode Standard. <a>UTF-8</a> used together with
144+
the <a>UTF-8 decode</a> algorithm matches the <i>encoding scheme</i> of the same name. This
145+
specification does not provide wrapper algorithms that would combine with <a>UTF-16LE</a> and
146+
<a>UTF-16BE</a> to match the similarly-named <i>encoding schemes</i>. [[UNICODE]]
147+
138148

139149
<h3 id=encoders-and-decoders>Encoders and decoders</h3>
140150

@@ -865,9 +875,9 @@ fallback encoding <var>encoding</var>, run these steps:
865875
<tr><td>0xFF 0xFE<td><a>UTF-16LE</a>
866876
</table>
867877

868-
<p class=note>For compatibility with deployed content, the byte order mark (also known as BOM) is
869-
more authoritative than anything else. In a context where HTTP is used this is in violation of the
870-
semantics of the `<code>Content-Type</code>` header.
878+
<p class=note>For compatibility with deployed content, the byte order mark is more authoritative
879+
than anything else. In a context where HTTP is used this is in violation of the semantics of the
880+
`<code>Content-Type</code>` header.
871881

872882
<li><p>If <var>BOM seen flag</var> is unset,
873883
<a>prepend</a> <var>buffer</var> to <var>stream</var>.
@@ -1312,6 +1322,10 @@ must run these steps:
13121322

13131323
<h4 id=utf-8-decoder dfn export>UTF-8 decoder</h4>
13141324

1325+
<p class="note no-backref">A byte order mark has priority over a <a>label</a> as it has been found
1326+
to be more accurate in deployed content. Therefore it is not part of the <a>UTF-8 decoder</a>
1327+
algorithm but rather the <a>decode</a> and <a>UTF-8 decode</a> algorithms.
1328+
13151329
<p><a>UTF-8</a>'s <a for=/>decoder</a>'s has an associated
13161330
<dfn>UTF-8 code point</dfn>, <dfn>UTF-8 bytes seen</dfn>, and
13171331
<dfn>UTF-8 bytes needed</dfn> (all initially 0), a <dfn>UTF-8 lower boundary</dfn>

0 commit comments

Comments
 (0)