-
Notifications
You must be signed in to change notification settings - Fork 83
Make TextEncoder and TextDecoder be transform streams #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add `readable` and `writable` attributes to TextEncoder and TextDecoder objects to permit them to be used to transform ReadableStreams via the `pipeThrough()` method. This integrates the Encoding Standard with the Streams Standard. See https://streams.spec.whatwg.org/#ts-model for the definition of a "transform stream" and https://streams.spec.whatwg.org/#rs-pipe-through for an explanation of the `pipeThrough()` method. A TextEncoder object can be used to transform a stream of strings to a stream of bytes in UTF-8 encoding. A TextDecoder object can be used to transform a stream of bytes in the encoding passed to the constructor to strings. The implementation delegates to a TransformStream object internally, which provides the glue logic that ties the `readable` and `writable` together. There is a human-readable version of these changes at http://htmlpreview.github.io/?https://github.com/ricea/encoding-streams/blob/master/patch.html There is a prollyfill and tests for the new functionality at https://github.com/GoogleChromeLabs/text-encode-transform-prollyfill
Yay! Comments in random order: Since it's a common enough request, would it be possible to include an example in the spec itself showing how to use streams to encode into an existing ArrayBuffer? Can the tests be upgraded to use In the tests, stream-properties.html should probably just be an idlharness test. (I don't know how well that plays with p(r)ollyfills, though.) What's the behavior for split surrogate pairs when encoding? We basically pretend that's not a problem in the non-streaming API, by making the input is a USVString and thus not needing a |
Sadly, bring-your-own-buffer style usage isn't supported by TransformStream yet. There's an open question about what to do when there isn't enough space in the supplied ArrayBuffer to fit the transformed input. Since this is basically an optimisation feature, I don't plan to support it in the first release. I would rather take my time to get it right.
Certainly. I'm never quite sure what version of ES to target in tests. For the Streams Standard tests I've been avoiding introducing async/await, but I'm happy to use them here.
I haven't tried it. I will give it a go.
Yes, it's broken. I've been planning to write a test for this at some point. What I expect to see is if you encode "😻" in a single chunk, it works as intended. If you split it into two chunks, you get two chunks as output each containing U+FFFD encoded as UTF-8. If there is consensus that this should be fixed, I think it can be done by converting to DOMString rather than USVString and special-casing the final code unit when it is the first half of a surrogate pair. |
I think a problem here is that if a script running in a document invokes the constructor of either See also https://www.w3.org/Bugs/Public/show_bug.cgi?id=24652 and https://w3c-test.org/WebIDL/current-realm.html. Note that if you change it to accept a DOMString you also need to do conversions from code unit to scalar value (you probably meant that; stating it here so I don't forget to verify it). (Tests can use JavaScript features that ship in the latest release of all browsers. At least, that's my rule of thumb.) |
I think we should indeed make split surrogates work. It seems like a big value proposition for having this built-in and shared, that people don't have to figure that out by themselves.
I don't think that is true. You're talking about |
Ah, great, thanks for confirming. Let's make sure to test that as part of the test I referenced. |
@inexorabletash GoogleChromeLabs/text-encode-transform-polyfill#6 updates the tests to use async / await (much more readable) and adds idlharness.html. The prollyfill fails a couple of the idlharness tests because it (intentionally) doesn't do brand checks. Apart from that, everything looks good. |
I have been writing tests for surrogate pairs split between chunks and in the process realised there's a problem. We're now explicitly taking a sequence of code units and normalising it to Unicode. Which means we're doing error handling. Which means the developer should probably be able to specify what type of error handling should be done. Which means adding a fatal option to the constructor. Which means that encode() should logically also support it. Which means that encode() also needs to be changed to take a DOMString. Which is a much more disruptive change than this one. My preference is to land this first, and then worry about providing control over error behaviour and changing encode() later, since that is a much riskier change. |
Well, but not any different than IDL always did, right? We're just changing the layering a bit. I'm not necessarily convinced String -> USVString conversion needs to be subject to fatal the same way that bytes <> scalar values needs to be. Note that the reason we have it in the latter case is primarily XML, which is the only format I know of that demands such handling. We exposed it to the API as a nicety mostly. |
That works for me too. |
I have written tests for split surrogates but I haven't written the standard language yet. I am working on tests for ensuring that objects are all created in the correct realms. This is taking a while. I probably won't have the standard language updated for split surrogates until the new year. |
@vyv03354 are you willing to review this as well? Thanks! |
Previously if the high surrogate and low surrogate were split between two chunks, they would be each replaced with the replacement character. Now the correct character will correctly appear in the output. Also mark "readable" and "writable" properties [SameObject].
I have updated the standard language for split surrogates. Sorry for the delay. I tried a few approaches and adding the "convert code unit to scalar value" algorithm seemed to work best. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly style nits (note that I've only reported the first instance).
I was actually hoping IDL's algorithm could become https://infra.spec.whatwg.org/#javascript-string-convert, but I couldn't really find a way to reuse that here so we might be stuck with some UTF-16 logic in multiple places forever.
encoding.bs
Outdated
type:dfn; text:writable stream; url:#writable-stream | ||
type:abstract-op; text:CreateTransformStream; url: #create-transform-stream | ||
type:abstract-op; text:TransformStreamDefaultControllerEnqueue; url: #transform-stream-default-controller-enqueue | ||
type:interface; text:TransformStream; url: #ts-class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's not exported yet in streams should be exported so there's no need to list it here.
encoding.bs
Outdated
text: type; url: #sec-ecmascript-data-types-and-values; type: dfn | ||
text: IsDetachedBuffer; url: #sec-isdetachedbuffer; type: abstract-op | ||
text: IsSharedArrayBuffer; url: #sec-issharedarraybuffer; type: abstract-op | ||
text: internal slot; url: #sec-object-internal-methods-and-internal-slots; type: dfn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally don't cross-reference "internal slot".
encoding.bs
Outdated
@@ -1186,10 +1246,31 @@ if <a for=TextDecoder>error mode</a> is "<code>fatal</code>", and false otherwis | |||
<p>The <dfn attribute for=TextDecoder><code>ignoreBOM</code></dfn> attribute's getter must return | |||
true if <a for=TextDecoder>ignore BOM flag</a> is set, and false otherwise. | |||
|
|||
<p>The <dfn attribute for=TextDecoder><code>readable</code></dfn> attribute's getter must return the | |||
contents of <a for=TextDecoder>transform</a>'s \[[readable]] <a>internal slot</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than "the contents of x" we usually say "x's value". @domenic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "return transform.[[readable]]" usually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, indeed.
encoding.bs
Outdated
slot</a>. | ||
|
||
<li><p>If <a abstract-op>IsReadableStreamLocked</a>(<var>readable</var>) is true, throw a | ||
{{TypeError}} exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then throw* (xref throw?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to xref throw I think; we haven't elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"throw" is hyperlinked elsewhere in this standard. It would be nice to be consistent.
encoding.bs
Outdated
{{TextDecoder}} <var>dec</var> and a <var>chunk</var>, runs these steps: | ||
|
||
<ol> | ||
<li><p>If the <a>type</a> of of <var>chunk</var> is not Object, or <var>chunk</var> does not have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be noted as Type(chunk)?
encoding.bs
Outdated
<li><p>If the <a>type</a> of of <var>chunk</var> is not Object, or <var>chunk</var> does not have | ||
an \[[ArrayBufferData]] <a>internal slot</a>, or <a | ||
abstract-op>IsDetachedBuffer</a>(<var>chunk</var>) is true, or <a | ||
abstract-op>IsSharedArrayBuffer</a>(<var>chunk</var>) is true then return <a>a promise rejected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma before then. Also I'm not sure I want to reference the promises guide as it really needs to be folded into IDL and has a bunch of things that confuse folks. Can we just say "a new promise rejected with a TypeError exception"?
encoding.bs
Outdated
abstract-op>IsSharedArrayBuffer</a>(<var>chunk</var>) is true then return <a>a promise rejected | ||
with</a> a {{TypeError}}. | ||
|
||
<li><p>If the <a for=TextDecoder>do not flush flag</a> for <var>dec</var> is unset, set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then set*
encoding.bs
Outdated
<li><p>Let <var>output</var> be a new <a for=/>stream</a>. | ||
|
||
<li> | ||
<p>While true, run these substeps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
steps* we try to avoid "substeps" and friends these days.
encoding.bs
Outdated
</ol> | ||
|
||
<li><p>Otherwise, return <a>a promise rejected with</a> a {{TypeError}}. | ||
</ol> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you ate a newline here.
encoding.bs
Outdated
|
||
<p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a {{ReadableStream}} source. | ||
|
||
<pre class=example id=example-textencode-writable> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<pre
also needs two spaces in front of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should probably use <code>
too, just like the other examples.
I'd like to reiterate the concerns about interoperability hazards associated with existing ArrayBuffer that I mentioned in the issue dedicated to that request.
While JS strings need to be able to exist in invalid-UTF-16 states when a JS program constructs a valid UTF-16 string, making unpaired surrogate preservation in crossing the WebIDL boundary into an explicit feature is a novel thing for the Web Platform and one that makes me uneasy. So far, we have USVString, which explicitly replaces unpaired surrogates and various DOMString APIs that in UTF-16-based Web engines pass unpaired surrogates through, but that's mostly incidental and Web developers don't have good reasons to pass unpaired surrogates through, so not preserving unpaired surrogates appears to be feasible in terms of Web compat. In Gecko, we were able to make CSSOM effectively change from DOMString to USVString without breaking the Web. If I've read the Servo source comments correctly, Servo panics if a DOMString contains unpaired surrogates and the absence of such crash reports suggests that the Web doesn't actually depend on unpaired surrogates crossing the WebIDL boundary (to the extent Servo has enough DOM features to be exercised with real Web content). While of all the Web Platform APIs this might be the one where letting unpaired surrogates cross the WebIDL boundary might make sense as an explicit feature, I think we should consider a bit more carefully if that's necessary and if it's feasible to require textual streams not to split surrogate pairs across chunk boundaries. |
@hsivonen Consider the following transform stream: const rechunker = new TransformStream({
transform(chunk, controller) {
for (let start = 0; start < chunk.length; start += 1024) {
const end = Math.min(start + 1024, chunk.length);
controller.enqueue(chunk.substring(start, end));
}
}
}); This takes a stream of strings and splits up any chunk that is longer than 1024 code units. Here are the questions I ask myself:
|
I'm not fully familiar with Streams. Where do the string chunks in the rechunker example cross the WebIDL boundary (if anywhere)? The controller seems to be expressed in ECMA spec style and not in WebIDL style. Is there really no WebIDL boundary here and the prose sentence "Let input be the result of converting chunk to a DOMString." defines the boundary that is usually expressed in WebIDL? |
The "convert code unit to scalar value" algorithm would not handle stray lead surrogates correctly. They would erroneously eat the next code unit. Fix the algorithm to replace the lead surrogate and leave the next code unit alone. Fix up the imports now that the correct things are exported from the Streams Standard. Consistently use ", then" in if statements. Don't use "substeps". Just "steps". It's cleaner. Use the formulation "transform.[[slot]]" instead of "transform's [[slot]] internal slot". Also stop linking "internal slot" to the Javascript standard. Write "Type(chunk)" instead of "the type of chunk". Don't link to the Promises Guide as it is viewed as problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed everything from @annevk's comments. I didn't respond "done" to each one individually as it might be a bit spammy.
I considered combining the "convert code unit to scalar value" algorithm with "shared UTF-16 decoder" but it would significantly complicate the latter for little benefit.
encoding.bs
Outdated
slot</a>. | ||
|
||
<li><p>If <a abstract-op>IsReadableStreamLocked</a>(<var>readable</var>) is true, throw a | ||
{{TypeError}} exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"throw" is hyperlinked elsewhere in this standard. It would be nice to be consistent.
Sorry for not explaining clearly. The chunks don't cross the WebIDL boundary within the rechunker example. It's only if you then pipe the stream to TextEncoder or some other stream defined in a WebIDL-based standard that the chunks will cross the WebIDL boundary.
I think that is correct. My intent is for "converting chunk to a DOMString" to imply the same processing that would normally happen upon calling a WebIDL method. Is this something that needs to be made more explicit? The same applies to the first step of the "decode and enqueue a chunk" algorithm. |
@ricea if transform streams gain a "bring your own buffer" capability does that need explicit changes to this API or would we run into #69 (comment) as a side effect of that capability being added? In case it's the latter, we'll need to address @hsivonen's concerns somehow now, I think. |
@annevk It won't be an automatic upgrade. There will be a different Create operation to call, similar to CreateReadableByteStream. While this is a bit regrettable, it would be bad if a change in the Streams Standard suddenly exposed internals of Encoding and other standards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have many nits that I'm also willing to help address once you consider things done.
Reading through #72 again it seems that with @jakearchibald you agreed on mode locking, but I cannot find that in the current draft. Did I miss a comment saying why we went back on that?
It would also be nice to add at least one example.
type:interface; text:ReadableStream | ||
type:dfn; text:chunk | ||
type:dfn; text:readable stream | ||
type:dfn; text:writable stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these needed? Do you get some conflict that cannot be resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ReadableStream" is also exported by Fetch, and so needs to be disambiguated. I have modified the Streams Standard to export "chunk", "readable stream" and "writable stream" in whatwg/streams#871, so those won't be needed once Bikeshed has picked up the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make Fetch stop doing that then? It doesn't really seem like Fetch's way of managing it handles anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yutakahirano Who is using the ReadableStream definition that is exported from Fetch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find none, but serviceworker uses related ones ("construct a ReadableStream object", "empty ReadableStream"). See also: whatwg/webidl#445
encoding.bs
Outdated
<dfn for=TextDecoder>do not flush flag</dfn> (initially unset). | ||
<dfn for=TextDecoder>error mode</dfn> (initially "<code>replacement</code>"), | ||
<dfn for=TextDecoder>do not flush flag</dfn> (initially unset), and | ||
<dfn for=TextDecoder>transform</dfn> (a {{TransformStream}} object). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems best to not say anything here between parenthesis as that's only used for initial values currently (which is inconsistent with other specifications, I realize). We could do a follow-up where we state the type for all of these and then put the initial value elsewhere.
encoding.bs
Outdated
@@ -1135,6 +1150,31 @@ control. | |||
<dt><code><var>decoder</var> . <a attribute for=TextDecoder>ignoreBOM</a></code> | |||
<dd><p>Returns true if <a for=TextDecoder>ignore BOM flag</a> is set, and false otherwise. | |||
|
|||
<dt><code><var>decoder</var> . <a attribute for=TextDecoder>readable</a></code> | |||
<dd> | |||
<p>Returns a <a>readable stream</a> whose <a>chunks</a> are strings resulting from running <a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't wrap inline/phrasing elements in any manner.
encoding.bs
Outdated
|
||
<dt><code><var>decoder</var> . <a attribute for=TextDecoder>writable</a></code> | ||
<dd> | ||
<p>Returns a <a>writable stream</a> which accepts {{BufferSource}} chunks and runs them through <a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
encoding.bs
Outdated
.pipeTo(textWritable);</code></pre> | ||
|
||
<p>If the <a for=TextDecoder>error mode</a> is "<code>fatal</code>" and | ||
<a for=TextDecoder>encoding</a>'s <a for=/>decoder</a> returns <a>error</a>, both {{TextDecoder/readable}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exceeds 100 columns.
encoding.bs
Outdated
{{TextEncoder}} <var>enc</var>, runs these steps: | ||
|
||
<ol> | ||
<li><p>If <var>enc</var>'s <a>pending high surrogate</a> is set, run these steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<li>
with multiple children again.
@@ -1241,6 +1314,88 @@ method, when invoked, must run these steps: | |||
</ol> | |||
</ol> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an <hr>
here would be good I think, to separate it a bit from the member definitions.
encoding.bs
Outdated
|
||
<p class=note>This is equivalent to the <a spec=webidl>convert a DOMString to a sequence of Unicode | ||
scalar values</a> algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between | ||
strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking, it would be better to reference https://infra.spec.whatwg.org/#javascript-string-convert here since eventually we'll update IDL to use that too and then this would need to be updated.
encoding.bs
Outdated
<ol> | ||
<li><p>Let <var>high surrogate</var> be <var>enc</var>'s <a>pending high surrogate</a>. | ||
|
||
<li><p>Unset <var>enc</var>'s <a>pending high surrogate</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be slightly clearer if we instead made pending high surrogate's type "null or a code unit" instead of the current mixture of flag and code unit. So check for non-null and set it to null when done.
@@ -1314,6 +1524,113 @@ must run these steps: | |||
</ol> | |||
</ol> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An <hr>
would be good here too.
Please add a very conspicuous note explaining that unlike in cases where the JS string to DOMString boundary passes through data that doesn't conform to Unicode as a performance optimization / legacy artifact of failure to enforce Unicode conformance, in this case letting code unit sequences that do not conform to Unicode travel across the boundary is an explicit feature with the intent of allowing the sequence of code units to become conforming once concatenated on the other side of the JS string / DOMString boundary. |
Do not wrap inside tags. Do not add lines that are longer than 100 characters.
Add <hr>s before the streaming-related algorithms to set them apart from the method algorithms. Remove some blank lines that shouldn't be there.
It was noted as being equivalent to the algorithm in WebIDL. Instead note that it is equivalent to the "convert a JavaScript string into a scalar value string" algorithm from INFRA. Also add a trailing full-stop.
TextEncoder's "pending high surrogate" was either a code unit or unset. Make it "null or a code unit" instead.
@hsivonen wrote:
I'm concerned that this may contradict the advice from the WebIDL standard, namely:
I'd prefer to have some clarity about whether there is a contradiction here and what the resolution should be before adding an editorial note. |
I've started a side-discussion about how to best implement mode-locking at whatwg/streams#873. The question is basically whether to add some complexity to the Streams Standard to reduce some complexity in the Encoding Standard. |
FYI: I've added tests that objects are created in the correct realms: GoogleChromeLabs/text-encode-transform-polyfill#11 |
A new TextDecoder variable, "decForTransform" is introduced to the constructor to achieve this.
Create a separate TextDecoder object "encForTransform" in the constructor, and use that to initialise "transform". Also clean up some wrapping problems in TextEncoder.
No point in running the encoder when the answer is always the same. Also fix markup indentation for notes.
I've attempted to implement the "Constructor version" of the API discussed #72 (comment). It doesn't seem too bad. I have some tests in https://github.com/GoogleChromeLabs/text-encode-transform-prollyfill/blob/f4de56ca18b98e55a79e7f2c9cb7658097b2aaf4/tests/non-interference.html. From the perspective of a consumer of the API, the behaviour looks pleasantly unsurprising to me. This is not a final decision to go with the "Constructor version" of the API. I could switch to the "Static method version" depending on how people think this looks. |
encoding.bs
Outdated
<p class=note>{{DOMString}} is used here so that a surrogate pair that is split between chunks can | ||
be correctly reassembled into the appropriate code point in the output. The behaviour is otherwise | ||
identical to {{USVString}}. In particular, replacement characters will be used where necessary to | ||
make the output valid UTF-8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if we stated that lone surrogates will be replaced with U+FFFD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. How is the updated version? I'm not sure whether "valid UTF-8" or "well-formed UTF-8" is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'd just say "is UTF-8", but for a note it seems fine. Although having said that, it's a little weird that it both talks about code points in the output and says the output is UTF-8 (which is bytes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of options I can think of
- Stop mentioning UTF-8 in this note altogether.
- Say "the appropriate character in the output" instead of "code point". Although talking about "characters" in relation to Unicode can be ambiguous.
Other options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
{{DOMString}} is used here so that a surrogate pair that is split between chunks can be reassembled into the appropriate scalar value. The behaviour is otherwise identical to {{USVString}}. In particular, lone surrogates will be replaced with U+FFFD.
Closing in favour of #149. |
Add
readable
andwritable
attributes to TextEncoder and TextDecoderobjects to permit them to be used to transform ReadableStreams via the
pipeThrough()
method.This integrates the Encoding Standard with the Streams Standard. See
https://streams.spec.whatwg.org/#ts-model for the definition of a
"transform stream" and https://streams.spec.whatwg.org/#rs-pipe-through
for an explanation of the
pipeThrough()
method.A TextEncoder object can be used to transform a stream of strings to a
stream of bytes in UTF-8 encoding. A TextDecoder object can be used to
transform a stream of bytes in the encoding passed to the constructor to
strings.
The implementation delegates to a TransformStream object internally,
which provides the glue logic that ties the
readable
andwritable
together.
There is a human-readable version of these changes at
http://htmlpreview.github.io/?https://github.com/ricea/encoding-streams/blob/master/patch.html
There is a prollyfill and tests for the new functionality at
https://github.com/GoogleChromeLabs/text-encode-transform-prollyfill
Closes #72.
Preview | Diff