@@ -135,6 +135,16 @@ a <a>byte</a> sequence (and vice versa). Each <a for=/>encoding</a> has a
135
135
<dfn id=name export for=encoding>name</dfn> , and one or more
136
136
<dfn id=label export for=encoding lt=label>labels</dfn> .
137
137
138
+ <p class="note no-backref"> This specification defines three <a>encodings</a> with the same names as
139
+ <i> encoding schemes</i> defined in the Unicode standard: <a>UTF-8</a> , <a>UTF-16LE</a> , and
140
+ <a>UTF-16BE</a> . The <a>encodings</a> differ from the <i> encoding schemes</i> by byte order mark
141
+ (also known as BOM) handling not being part of the <a>encodings</a> themselves and instead being
142
+ part of wrapper algorithms in this specification, whereas byte order mark handling is part of the
143
+ definition of the <i> encoding schemes</i> in the Unicode Standard. <a>UTF-8</a> used together with
144
+ the <a>UTF-8 decode</a> algorithm matches the <i> encoding scheme</i> of the same name. This
145
+ specification does not provide wrapper algorithms that would combine with <a>UTF-16LE</a> and
146
+ <a>UTF-16BE</a> to match the similarly-named <i> encoding schemes</i> . [[UNICODE]]
147
+
138
148
139
149
<h3 id=encoders-and-decoders>Encoders and decoders</h3>
140
150
@@ -865,9 +875,9 @@ fallback encoding <var>encoding</var>, run these steps:
865
875
<tr><td> 0xFF 0xFE<td> <a>UTF-16LE</a>
866
876
</table>
867
877
868
- <p class=note> For compatibility with deployed content, the byte order mark (also known as BOM) is
869
- more authoritative than anything else. In a context where HTTP is used this is in violation of the
870
- semantics of the `<code> Content-Type</code> ` header.
878
+ <p class=note> For compatibility with deployed content, the byte order mark is more authoritative
879
+ than anything else. In a context where HTTP is used this is in violation of the semantics of the
880
+ `<code> Content-Type</code> ` header.
871
881
872
882
<li><p> If <var> BOM seen flag</var> is unset,
873
883
<a>prepend</a> <var> buffer</var> to <var> stream</var> .
@@ -1312,6 +1322,10 @@ must run these steps:
1312
1322
1313
1323
<h4 id=utf-8-decoder dfn export>UTF-8 decoder</h4>
1314
1324
1325
+ <p class="note no-backref"> A byte order mark has priority over a <a>label</a> as it has been found
1326
+ to be more accurate in deployed content. Therefore it is not part of the <a>UTF-8 decoder</a>
1327
+ algorithm but rather the <a>decode</a> and <a>UTF-8 decode</a> algorithms.
1328
+
1315
1329
<p> <a>UTF-8</a> 's <a for=/>decoder</a>' s has an associated
1316
1330
<dfn>UTF-8 code point</dfn> , <dfn>UTF-8 bytes seen</dfn> , and
1317
1331
<dfn>UTF-8 bytes needed</dfn> (all initially 0), a <dfn>UTF-8 lower boundary</dfn>
0 commit comments