From 0506836b4314cb212ce49e957be7101ebacf083a Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Tue, 7 Feb 2017 10:55:28 +0900 Subject: [PATCH 01/17] Make TextEncoder and TextDecoder be transform streams Add `readable` and `writable` attributes to TextEncoder and TextDecoder objects to permit them to be used to transform ReadableStreams via the `pipeThrough()` method. This integrates the Encoding Standard with the Streams Standard. See https://streams.spec.whatwg.org/#ts-model for the definition of a "transform stream" and https://streams.spec.whatwg.org/#rs-pipe-through for an explanation of the `pipeThrough()` method. A TextEncoder object can be used to transform a stream of strings to a stream of bytes in UTF-8 encoding. A TextDecoder object can be used to transform a stream of bytes in the encoding passed to the constructor to strings. The implementation delegates to a TransformStream object internally, which provides the glue logic that ties the `readable` and `writable` together. There is a human-readable version of these changes at http://htmlpreview.github.io/?https://github.com/ricea/encoding-streams/blob/master/patch.html There is a prollyfill and tests for the new functionality at https://github.com/GoogleChromeLabs/text-encode-transform-prollyfill --- encoding.bs | 266 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 263 insertions(+), 3 deletions(-) diff --git a/encoding.bs b/encoding.bs index 8963bb8..91e4727 100644 --- a/encoding.bs +++ b/encoding.bs @@ -28,6 +28,24 @@ Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeo spec:infra; type:dfn; text:code point text:ascii case-insensitive +spec:streams; type:interface; + text:ReadableStream + text:WritableStream + + +
+spec: streams; urlPrefix: https://streams.spec.whatwg.org/
+    type:dfn; text:chunk; url:#chunk
+    type:dfn; text:readable stream; url:#readable-stream
+    type:dfn; text:writable stream; url:#writable-stream
+    type:abstract-op; text:CreateTransformStream; url: #create-transform-stream
+    type:abstract-op; text:TransformStreamDefaultControllerEnqueue; url: #transform-stream-default-controller-enqueue
+    type:interface; text:TransformStream; url: #ts-class
+spec: ECMASCRIPT; urlPrefix: https://tc39.github.io/ecma262/
+    text: type; url: #sec-ecmascript-data-types-and-values; type: dfn
+    text: IsDetachedBuffer; url: #sec-isdetachedbuffer; type: abstract-op
+    text: IsSharedArrayBuffer; url: #sec-issharedarraybuffer; type: abstract-op
+    text: internal slot; url: #sec-object-internal-methods-and-internal-slots; type: dfn
 
@@ -1066,6 +1084,8 @@ interface TextDecoder { readonly attribute DOMString encoding; readonly attribute boolean fatal; readonly attribute boolean ignoreBOM; + readonly attribute ReadableStream readable; + readonly attribute WritableStream writable; USVString decode(optional BufferSource input, optional TextDecodeOptions options); }; @@ -1073,8 +1093,9 @@ interface TextDecoder { decoder, stream, ignore BOM flag (initially unset), BOM seen flag (initially unset), -error mode (initially "replacement"), and -do not flush flag (initially unset). +error mode (initially "replacement"), +do not flush flag (initially unset), and +transform (a {{TransformStream}} object).

A {{TextDecoder}} object also has an associated serialize stream algorithm, that given a @@ -1135,6 +1156,31 @@ control.

decoder . ignoreBOM

Returns true if ignore BOM flag is set, and false otherwise. +

decoder . readable +
+

Returns a readable stream whose chunks are strings resulting from running encoding's decoder on the chunks written to + {{TextDecoder/writable}}. + +

decoder . writable +
+

Returns a writable stream which accepts {{BufferSource}} chunks and runs them through encoding's decoder before making them available to + {{TextDecoder/readable}}. + +

Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a + {{ReadableStream}} source. + +

+var decoder = new TextDecoder(encoding);
+byteReadable
+  .pipeThrough(decoder)
+  .pipeTo(textWritable);
+ +

If the error mode is "fatal" and + encoding's decoder returns error, both {{TextDecoder/readable}} + and {{TextDecoder/writable}} will be errored with a {{TypeError}}. +

decoder . decode([input [, options]])

Returns the result of running encoding's decoder. The @@ -1174,6 +1220,20 @@ constructor, when invoked, must run these steps:

  • If options's ignoreBOM member is true, then set dec's ignore BOM flag. +

  • Let startAlgorithm be an algorithm that takes no arguments and returns nothing. + +

  • Let transformAlgorithm be an algorithm which takes a chunk argument + and runs the decode and enqueue a chunk algorithm with dec and chunk. + +

  • Let flushAlgorithm be an algorithm which takes no arguments and runs the flush + and enqueue algorithm with dec. + +

  • Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, + flushAlgorithm). + +

  • Set dec's transform to transform. +

  • Return dec. @@ -1186,10 +1246,31 @@ if error mode is "fatal", and false otherwis

    The ignoreBOM attribute's getter must return true if ignore BOM flag is set, and false otherwise. +

    The readable attribute's getter must return the +contents of transform's \[[readable]] internal slot. + +

    The writable attribute's getter must return the +contents of transform's \[[writable]] internal slot. +

    The decode(input, options) method, when invoked, must run these steps:

      +
    1. Let readable be transform's \[[readable]] internal + slot. + +

    2. If IsReadableStreamLocked(readable) is true, throw a + {{TypeError}} exception. + +

    3. Let writable be transform's \[[writable]] internal + slot. + +

    4. If IsWritableStreamLocked(writable) is true, throw a + {{TypeError}} exception. + +

      These steps ensure that the state of the decoder is not simultaneously modified by + the Streams API and this method. +

    5. If the do not flush flag is unset, set decoder to a new encoding's decoder, set stream to a new stream, and unset the BOM seen flag. @@ -1241,6 +1322,88 @@ method, when invoked, must run these steps:

    +

    The decode and enqueue a chunk algorithm, given a +{{TextDecoder}} dec and a chunk, runs these steps: + +

      +
    1. If the type of of chunk is not Object, or chunk does not have + an \[[ArrayBufferData]] internal slot, or IsDetachedBuffer(chunk) is true, or IsSharedArrayBuffer(chunk) is true then return a promise rejected + with a {{TypeError}}. + +

    2. If the do not flush flag for dec is unset, set + dec's decoder to a new decoder for dec's + encoding, set dec's stream to a new stream, and unset dec's BOM seen flag. + +

    3. Set dec's do not flush flag. + +

    4. Push a copy of chunk to + dec's stream. + +

    5. Let controller be dec's transform's + \[[transformStreamController]] internal slot. + +

    6. Let output be a new stream. + +

    7. +

      While true, run these substeps: + +

        +
      1. Let token be the result of reading from dec's stream. + +

      2. +

        If token is end-of-stream, run these substeps: +

          +
        1. Let outputChunk be output, serialized. + +

        2. Call TransformStreamDefaultControllerEnqueue(controller, outputChunk). + +

        3. Return a promise resolved with undefined. +

        + +
      3. Let result be the result of processing token for + dec's decoder, dec's stream, + output, and dec's error mode. + +

      4. If result is error, return a promise rejected with a + {{TypeError}} exception. +

      +
    + +

    The flush and enqueue algorithm, which handles the end +of data from the input {{ReadableStream}}, given a {{TextDecoder}} dec, runs these steps: + +

      +
    1. If the do not flush flag for dec is unset, set + dec's decoder to a new decoder for dec's + encoding, set dec's stream to a new stream, and unset dec's BOM seen flag. + +

    2. Unset dec's do not flush flag. + +

    3. Let output be a new stream. + +

    4. Let result be the result of processing end-of-stream for + dec's decoder and dec's stream, + output, and dec's error mode. + +

    5. If result is finished, run these substeps: +

        +
      1. Let outputChunk be output, serialized. + +

      2. Let controller be dec's transform's + \[[transformStreamController]] internal slot. + +

      3. Call TransformStreamDefaultControllerEnqueue(controller, outputChunk). + +

      4. Return a promise resolved with undefined. +

      + +
    6. Otherwise, return a promise rejected with a {{TypeError}}. +

    Interface {{TextEncoder}}

    @@ -1249,10 +1412,13 @@ method, when invoked, must run these steps: Exposed=(Window,Worker)] interface TextEncoder { readonly attribute DOMString encoding; + readonly attribute ReadableStream readable; + readonly attribute WritableStream writable; [NewObject] Uint8Array encode(optional USVString input = ""); }; -

    A {{TextEncoder}} object has an associated encoder. +

    A {{TextEncoder}} object has an associated encoder and transform (a {{TransformStream}} object).

    A {{TextEncoder}} object offers no label argument as it only supports UTF-8. It also offers no stream option as no encoder @@ -1267,6 +1433,23 @@ requires buffering of scalar values.

    encoder . encoding

    Returns "utf-8". +

    encoder . readable +
    +

    Returns a readable stream whose chunks are {{Uint8Array}}s resulting from running + UTF-8's encoder on the chunks written to {{TextEncoder/writable}}. + +

    encoder . writable +
    +

    Returns a writable stream which accepts string chunks and runs them through + UTF-8's encoder before making them available to {{TextEncoder/readable}}. + +

    Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a {{ReadableStream}} source. + +

    +textReadable
    +  .pipeThrough(new TextEncoder())
    +  .pipeTo(byteWritable);
    +
    encoder . encode([input = ""])

    Returns the result of running UTF-8's encoder. @@ -1279,16 +1462,51 @@ constructor, when invoked, must run these steps:

  • Set enc's encoder to UTF-8's encoder. +

  • Let startAlgorithm be an algorithm that takes no arguments and returns nothing. + +

  • Let transformAlgorithm be an algorithm which takes a chunk argument + and runs the encode and enqueue a chunk algorithm with enc and chunk. + +

  • Let flushAlgorithm be an algorithm which returns a promise resolved with + undefined. + +

  • Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, + flushAlgorithm). + +

  • Set enc's transform to transform +

  • Return enc.

    The encoding attribute's getter must return "utf-8". +

    The readable attribute's getter must return the +contents of transform's \[[readable]] internal slot. + +

    The writable attribute's getter must return the +contents of transform's \[[writable]] internal slot. +

    The encode(input) method, when invoked, must run these steps:

      +
    1. Let readable be transform's \[[readable]] internal + slot. + +

    2. If IsReadableStreamLocked(readable) is true, throw a + {{TypeError}} exception. + +

    3. Let writable be transform's \[[writable]] internal + slot. + +

    4. If IsWritableStreamLocked(writable) is true, throw a + {{TypeError}} exception. + +

      These steps are for consistent behaviour with the {{TextDecoder}} decode(input) method. +

    5. Convert input to a stream.

    6. Let output be a new stream. @@ -1314,6 +1532,48 @@ must run these steps:

    +

    The encode and enqueue a chunk algorithm, given a +{{TextEncoder}} dec and chunk, runs these steps: + +

      +
    1. Let input be the result of converting + chunk to a {{USVString}}. If this throws an exception, return a promise rejected with + that exception. + +

    2. Convert input to a stream. + +

    3. Let output be a new stream. + +

    4. Let encoder be UTF-8's encoder. + +

    5. Let controller be dec's transform's + \[[transformStreamController]] internal slot. + +

    6. While true, run these substeps: + + + +

        +
      1. Let token be the result of reading from input. + +

      2. Let result be the result of processing token for + encoder, input, output. + +

      3. If result is finished, run these substeps: +

          +
        1. Convert output into a byte sequence. + +

        2. Call TransformStreamDefaultControllerEnqueue(controller, output). + +

        3. Return a promise resolved with undefined. +

        +
      +
    +

    The encoding

    From 534f2e30db4c7d4abfc93be2999a1dbd6aaa1a01 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Mon, 22 Jan 2018 23:44:34 +0900 Subject: [PATCH 02/17] Enable streamed encoding of split surrogate pairs Previously if the high surrogate and low surrogate were split between two chunks, they would be each replaced with the replacement character. Now the correct character will correctly appear in the output. Also mark "readable" and "writable" properties [SameObject]. --- encoding.bs | 107 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 85 insertions(+), 22 deletions(-) diff --git a/encoding.bs b/encoding.bs index 91e4727..cb8570a 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1084,8 +1084,8 @@ interface TextDecoder { readonly attribute DOMString encoding; readonly attribute boolean fatal; readonly attribute boolean ignoreBOM; - readonly attribute ReadableStream readable; - readonly attribute WritableStream writable; + [SameObject] readonly attribute ReadableStream readable; + [SameObject] readonly attribute WritableStream writable; USVString decode(optional BufferSource input, optional TextDecodeOptions options); }; @@ -1412,13 +1412,14 @@ of data from the input {{ReadableStream}}, given a {{TextDecoder}} dec -

    A {{TextEncoder}} object has an associated encoder and transform (a {{TransformStream}} object). +

    A {{TextEncoder}} object has an associated encoder, transform (a {{TransformStream}} object) and pending high +surrogate (initially unset).

    A {{TextEncoder}} object offers no label argument as it only supports UTF-8. It also offers no stream option as no encoder @@ -1467,8 +1468,8 @@ constructor, when invoked, must run these steps:

  • Let transformAlgorithm be an algorithm which takes a chunk argument and runs the encode and enqueue a chunk algorithm with enc and chunk. -

  • Let flushAlgorithm be an algorithm which returns a promise resolved with - undefined. +

  • Let flushAlgorithm be an algorithm which runs the encode and flush + algorithm with enc.

  • Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, @@ -1532,12 +1533,12 @@ must run these steps: -

    The encode and enqueue a chunk algorithm, given a -{{TextEncoder}} dec and chunk, runs these steps: +

    The encode and enqueue a chunk algorithm, given a +{{TextEncoder}} enc and chunk, runs these steps:

    1. Let input be the result of converting - chunk to a {{USVString}}. If this throws an exception, return a promise rejected with + chunk to a {{DOMString}}. If this throws an exception, return a promise rejected with that exception.

    2. Convert input to a stream. @@ -1546,35 +1547,97 @@ must run these steps:

    3. Let encoder be UTF-8's encoder. -

    4. Let controller be dec's transform's +

    5. Let controller be enc's transform's \[[transformStreamController]] internal slot.

    6. While true, run these substeps: - -

      1. Let token be the result of reading from input. -

      2. Let result be the result of processing token for - encoder, input, output. +

      3. If token is end-of-stream then run these substeps: -

      4. If result is finished, run these substeps:

        1. Convert output into a byte sequence. -

        2. Call TransformStreamDefaultControllerEnqueue(controller, output). +

        3. Call TransformStreamDefaultControllerEnqueue(controller, + output).

        4. Return a promise resolved with undefined.

        + +
      5. Let result by the result of executing the convert code unit to scalar value + algorithm with enc and token. + +

      6. If result is not continue, process result for + encoder, input, output. +

    +

    The convert code unit to scalar value +algorithm, given a {{TextEncoder}} enc and token, runs these steps: + +

      +
    1. If enc's pending high surrogate is set, run these substeps: + +

        +
      1. Let high surrogate be enc's pending high surrogate. + +

      2. Unset enc's pending high surrogate. + +

      3. If token is in the range U+DC00 to U+DFFF, inclusive, return a code point whose + value is 0x10000 + ((high surrogate − 0xD800) << 10) + (token + − 0xDC00). + +

      4. Return U+FFFD. +

      + +
    2. If token is in the range U+D800 to U+DBFF, inclusive, set pending high + surrogate to token and return continue. + +

    3. If token is in the range U+DC00 to U+DFFF, inclusive, return U+FFFD. + +

    4. Return token. +

    + +

    This is equivalent to the convert a DOMString to a sequence of Unicode +scalar values algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between +strings. + + +

    The encode and flush algorithm, given a +{{TextEncoder}} enc, runs these steps: + +

      +
    1. If enc's pending high surrogate is set, run these substeps: + +

        +
      1. Unset enc's pending high surrogate + +

      2. Let input be a new stream. + +

      3. Let output be a new stream. + +

      4. Let encoder be UTF-8's encoder. + +

      5. Let controller be enc's transform's + \[[transformStreamController]] internal slot. + +

      6. Let token be U+FFFD. + +

      7. Process token for encoder, input, output. + +

      8. Convert output into a byte sequence. + +

      9. Call TransformStreamDefaultControllerEnqueue(controller, + output). +

      + +
    2. Return a promise resolved with undefined. +

    +

    The encoding

    From 1487182b28f53d66128556fa0cd04dc5b361215f Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Tue, 23 Jan 2018 22:33:07 +0900 Subject: [PATCH 03/17] Fix convert to scalar values algorithm and fixes The "convert code unit to scalar value" algorithm would not handle stray lead surrogates correctly. They would erroneously eat the next code unit. Fix the algorithm to replace the lead surrogate and leave the next code unit alone. Fix up the imports now that the correct things are exported from the Streams Standard. Consistently use ", then" in if statements. Don't use "substeps". Just "steps". It's cleaner. Use the formulation "transform.[[slot]]" instead of "transform's [[slot]] internal slot". Also stop linking "internal slot" to the Javascript standard. Write "Type(chunk)" instead of "the type of chunk". Don't link to the Promises Guide as it is viewed as problematic. --- encoding.bs | 126 +++++++++++++++++++++++++--------------------------- 1 file changed, 60 insertions(+), 66 deletions(-) diff --git a/encoding.bs b/encoding.bs index cb8570a..18e9cb6 100644 --- a/encoding.bs +++ b/encoding.bs @@ -28,24 +28,18 @@ Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeo spec:infra; type:dfn; text:code point text:ascii case-insensitive -spec:streams; type:interface; - text:ReadableStream - text:WritableStream +spec:streams; + type:interface; text:ReadableStream + type:dfn; text:chunk + type:dfn; text:readable stream + type:dfn; text:writable stream
    -spec: streams; urlPrefix: https://streams.spec.whatwg.org/
    -    type:dfn; text:chunk; url:#chunk
    -    type:dfn; text:readable stream; url:#readable-stream
    -    type:dfn; text:writable stream; url:#writable-stream
    -    type:abstract-op; text:CreateTransformStream; url: #create-transform-stream
    -    type:abstract-op; text:TransformStreamDefaultControllerEnqueue; url: #transform-stream-default-controller-enqueue
    -    type:interface; text:TransformStream; url: #ts-class
     spec: ECMASCRIPT; urlPrefix: https://tc39.github.io/ecma262/
    -    text: type; url: #sec-ecmascript-data-types-and-values; type: dfn
    +    text: Type; url: #sec-ecmascript-data-types-and-values; type: dfn
         text: IsDetachedBuffer; url: #sec-isdetachedbuffer; type: abstract-op
         text: IsSharedArrayBuffer; url: #sec-issharedarraybuffer; type: abstract-op
    -    text: internal slot; url: #sec-object-internal-methods-and-internal-slots; type: dfn
     
    @@ -1171,11 +1165,11 @@ control.

    Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a {{ReadableStream}} source. -

    +  
    
     var decoder = new TextDecoder(encoding);
     byteReadable
       .pipeThrough(decoder)
    -  .pipeTo(textWritable);
    + .pipeTo(textWritable);

    If the error mode is "fatal" and encoding's decoder returns error, both {{TextDecoder/readable}} @@ -1246,26 +1240,24 @@ if error mode is "fatal", and false otherwis

    The ignoreBOM attribute's getter must return true if ignore BOM flag is set, and false otherwise. -

    The readable attribute's getter must return the -contents of transform's \[[readable]] internal slot. +

    The readable attribute's getter must return transform.\[[readable]]. -

    The writable attribute's getter must return the -contents of transform's \[[writable]] internal slot. +

    The writable attribute's getter must return transform.\[[writable]].

    The decode(input, options) method, when invoked, must run these steps:

      -
    1. Let readable be transform's \[[readable]] internal - slot. +

    2. Let readable be transform.\[[readable]]. -

    3. If IsReadableStreamLocked(readable) is true, throw a +

    4. If IsReadableStreamLocked(readable) is true, then throw a {{TypeError}} exception. -

    5. Let writable be transform's \[[writable]] internal - slot. +

    6. Let writable be transform.\[[writable]]. -

    7. If IsWritableStreamLocked(writable) is true, throw a +

    8. If IsWritableStreamLocked(writable) is true, then throw a {{TypeError}} exception.

      These steps ensure that the state of the decoder is not simultaneously modified by @@ -1275,7 +1267,7 @@ method, when invoked, must run these steps: to a new encoding's decoder, set stream to a new stream, and unset the BOM seen flag. -

    9. If options's stream is true, set the +

    10. If options's stream is true, then set the do not flush flag, and unset the do not flush flag otherwise. @@ -1326,11 +1318,10 @@ method, when invoked, must run these steps: {{TextDecoder}} dec and a chunk, runs these steps:

        -
      1. If the type of of chunk is not Object, or chunk does not have - an \[[ArrayBufferData]] internal slot, or IsDetachedBuffer(chunk) is true, or IsSharedArrayBuffer(chunk) is true then return a promise rejected - with a {{TypeError}}. +

      2. If Type(chunk) is not Object, or chunk does not have an + \[[ArrayBufferData]] internal slot, or IsDetachedBuffer(chunk) is + true, or IsSharedArrayBuffer(chunk) is true, then return a new + promise rejected with a {{TypeError}} exception.

      3. If the do not flush flag for dec is unset, set dec's decoder to a new decoder for dec's @@ -1342,33 +1333,33 @@ method, when invoked, must run these steps:

      4. Push a copy of chunk to dec's stream. -

      5. Let controller be dec's transform's - \[[transformStreamController]] internal slot. +

      6. Let controller be dec's + transform.\[[transformStreamController]].

      7. Let output be a new stream.

      8. -

        While true, run these substeps: +

        While true, run these steps:

        1. Let token be the result of reading from dec's stream.

        2. -

          If token is end-of-stream, run these substeps: +

          If token is end-of-stream, run these steps:

          1. Let outputChunk be output, serialized.

          2. Call TransformStreamDefaultControllerEnqueue(controller, outputChunk). -

          3. Return a promise resolved with undefined. +

          4. Return a new promise resolved with undefined.

        3. Let result be the result of processing token for dec's decoder, dec's stream, output, and dec's error mode. -

        4. If result is error, return a promise rejected with a +

        5. If result is error, return a new promise rejected with a {{TypeError}} exception.

      @@ -1390,21 +1381,22 @@ of data from the input {{ReadableStream}}, given a {{TextDecoder}} decdec's decoder and dec's stream, output, and dec's error mode. -
    11. If result is finished, run these substeps: +

    12. If result is finished, run these steps:

      1. Let outputChunk be output, serialized. -

      2. Let controller be dec's transform's - \[[transformStreamController]] internal slot. +

      3. Let controller be dec's + transform.\[[transformStreamController]].

      4. Call TransformStreamDefaultControllerEnqueue(controller, outputChunk). -

      5. Return a promise resolved with undefined. +

      6. Return a new promise resolved with undefined.

      -
    13. Otherwise, return a promise rejected with a {{TypeError}}. +

    14. Otherwise, return a new promise rejected with a {{TypeError}} exception.

    +

    Interface {{TextEncoder}}

    @@ -1446,10 +1438,11 @@ requires buffering of scalar values.
     
       

    Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a {{ReadableStream}} source. -

    +
    +  
    
     textReadable
       .pipeThrough(new TextEncoder())
    -  .pipeTo(byteWritable);
    + .pipeTo(byteWritable);
    encoder . encode([input = ""])

    Returns the result of running UTF-8's encoder. @@ -1483,26 +1476,24 @@ constructor, when invoked, must run these steps:

    The encoding attribute's getter must return "utf-8". -

    The readable attribute's getter must return the -contents of transform's \[[readable]] internal slot. +

    The readable attribute's getter must return transform.\[[readable]]. -

    The writable attribute's getter must return the -contents of transform's \[[writable]] internal slot. +

    The writable attribute's getter must return transform.\[[writable]].

    The encode(input) method, when invoked, must run these steps:

      -
    1. Let readable be transform's \[[readable]] internal - slot. +

    2. Let readable be transform.\[[readable]]. -

    3. If IsReadableStreamLocked(readable) is true, throw a +

    4. If IsReadableStreamLocked(readable) is true, then throw a {{TypeError}} exception. -

    5. Let writable be transform's \[[writable]] internal - slot. +

    6. Let writable be transform.\[[writable]]. -

    7. If IsWritableStreamLocked(writable) is true, throw a +

    8. If IsWritableStreamLocked(writable) is true, then throw a {{TypeError}} exception.

      These steps are for consistent behaviour with the {{TextDecoder}}

      Let encoder be UTF-8's encoder. -

    9. Let controller be enc's transform's - \[[transformStreamController]] internal slot. +

    10. Let controller be enc's transform.\[[transformStreamController]]. -

    11. While true, run these substeps: +

    12. While true, run these steps:

      1. Let token be the result of reading from input. -

      2. If token is end-of-stream then run these substeps: +

      3. If token is end-of-stream, then run these steps:

        1. Convert output into a byte sequence. @@ -1563,11 +1554,11 @@ must run these steps:

        2. Call TransformStreamDefaultControllerEnqueue(controller, output). -

        3. Return a promise resolved with undefined. +

        4. Return a new promise resolved with undefined.

        -
      4. Let result by the result of executing the convert code unit to scalar value - algorithm with enc and token. +

      5. Let result be the result of executing the convert code unit to scalar + value algorithm with enc, token and input.

      6. If result is not continue, process result for encoder, input, output. @@ -1577,10 +1568,11 @@ must run these steps:

        The convert code unit to scalar value -algorithm, given a {{TextEncoder}} enc and token, runs these steps: +algorithm, given a {{TextEncoder}} enc, token and input stream, +runs these steps:

          -
        1. If enc's pending high surrogate is set, run these substeps: +

        2. If enc's pending high surrogate is set, run these steps:

          1. Let high surrogate be enc's pending high surrogate. @@ -1591,6 +1583,8 @@ algorithm, given a {{TextEncoder}} enc and token, runs the value is 0x10000 + ((high surrogate − 0xD800) << 10) + (token − 0xDC00). +

          2. Prepend token to input. +

          3. Return U+FFFD.

          @@ -1611,7 +1605,7 @@ strings. {{TextEncoder}} enc, runs these steps:
            -
          1. If enc's pending high surrogate is set, run these substeps: +

          2. If enc's pending high surrogate is set, run these steps:

            1. Unset enc's pending high surrogate @@ -1622,8 +1616,8 @@ strings.

            2. Let encoder be UTF-8's encoder. -

            3. Let controller be enc's transform's - \[[transformStreamController]] internal slot. +

            4. Let controller be enc's transform.\[[transformStreamController]].

            5. Let token be U+FFFD. @@ -1635,7 +1629,7 @@ strings. output).

            -
          3. Return a promise resolved with undefined. +

          4. Return a new promise resolved with undefined.

          From 497545fae435fbbaa778a070357ff3a9c82b50ec Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Thu, 25 Jan 2018 23:59:59 +0900 Subject: [PATCH 04/17] Fix wrapping Do not wrap inside tags. Do not add lines that are longer than 100 characters. --- encoding.bs | 85 ++++++++++++++++++++++++++++------------------------- 1 file changed, 45 insertions(+), 40 deletions(-) diff --git a/encoding.bs b/encoding.bs index 18e9cb6..e0195c2 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1152,14 +1152,14 @@ control.
          decoder . readable
          -

          Returns a readable stream whose chunks are strings resulting from running encoding's decoder on the chunks written to +

          Returns a readable stream whose chunks are strings resulting from running + encoding's decoder on the chunks written to {{TextDecoder/writable}}.

          decoder . writable
          -

          Returns a writable stream which accepts {{BufferSource}} chunks and runs them through encoding's decoder before making them available to +

          Returns a writable stream which accepts {{BufferSource}} chunks and runs them through + encoding's decoder before making them available to {{TextDecoder/readable}}.

          Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a @@ -1172,8 +1172,8 @@ byteReadable .pipeTo(textWritable);

    If the error mode is "fatal" and - encoding's decoder returns error, both {{TextDecoder/readable}} - and {{TextDecoder/writable}} will be errored with a {{TypeError}}. + encoding's decoder returns error, both + {{TextDecoder/readable}} and {{TextDecoder/writable}} will be errored with a {{TypeError}}.

    decoder . decode([input [, options]])
    @@ -1222,8 +1222,8 @@ constructor, when invoked, must run these steps:
  • Let flushAlgorithm be an algorithm which takes no arguments and runs the flush and enqueue algorithm with dec. -

  • Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, +

  • Let transform be the result of calling + CreateTransformStream(startAlgorithm, transformAlgorithm, flushAlgorithm).

  • Set dec's transform to transform. @@ -1240,11 +1240,11 @@ if error mode is "fatal", and false otherwis

    The ignoreBOM attribute's getter must return true if ignore BOM flag is set, and false otherwise. -

    The readable attribute's getter must return transform.\[[readable]]. +

    The readable attribute's getter must return +transform.\[[readable]]. -

    The writable attribute's getter must return transform.\[[writable]]. +

    The writable attribute's getter must return +transform.\[[writable]].

    The decode(input, options) method, when invoked, must run these steps: @@ -1260,8 +1260,8 @@ method, when invoked, must run these steps:

  • If IsWritableStreamLocked(writable) is true, then throw a {{TypeError}} exception. -

    These steps ensure that the state of the decoder is not simultaneously modified by - the Streams API and this method. +

    These steps ensure that the state of the decoder is not simultaneously + modified by the Streams API and this method.

  • If the do not flush flag is unset, set decoder to a new encoding's decoder, set stream @@ -1325,8 +1325,8 @@ method, when invoked, must run these steps:

  • If the do not flush flag for dec is unset, set dec's decoder to a new decoder for dec's - encoding, set dec's stream to a new stream, and unset dec's BOM seen flag. + encoding, set dec's stream to a new + stream, and unset dec's BOM seen flag.

  • Set dec's do not flush flag. @@ -1342,15 +1342,17 @@ method, when invoked, must run these steps:

    While true, run these steps:

      -
    1. Let token be the result of reading from dec's stream. +

    2. Let token be the result of reading from dec's + stream.

    3. If token is end-of-stream, run these steps:

        -
      1. Let outputChunk be output, serialized. +

      2. Let outputChunk be output, + serialized. -

      3. Call TransformStreamDefaultControllerEnqueue(controller, outputChunk). +

      4. Call TransformStreamDefaultControllerEnqueue(controller, + outputChunk).

      5. Return a new promise resolved with undefined.

      @@ -1370,8 +1372,8 @@ of data from the input {{ReadableStream}}, given a {{TextDecoder}} dec
    4. If the do not flush flag for dec is unset, set dec's decoder to a new decoder for dec's - encoding, set dec's stream to a new stream, and unset dec's BOM seen flag. + encoding, set dec's stream to a new + stream, and unset dec's BOM seen flag.

    5. Unset dec's do not flush flag. @@ -1388,7 +1390,8 @@ of data from the input {{ReadableStream}}, given a {{TextDecoder}} dec

      Let controller be dec's transform.\[[transformStreamController]]. -

    6. Call TransformStreamDefaultControllerEnqueue(controller, outputChunk). +

    7. Call TransformStreamDefaultControllerEnqueue(controller, + outputChunk).

    8. Return a new promise resolved with undefined.

    @@ -1409,9 +1412,9 @@ interface TextEncoder { [NewObject] Uint8Array encode(optional USVString input = ""); }; -

    A {{TextEncoder}} object has an associated encoder, transform (a {{TransformStream}} object) and pending high -surrogate (initially unset). +

    A {{TextEncoder}} object has an associated encoder, +transform (a {{TransformStream}} object) and +pending high surrogate (initially unset).

    A {{TextEncoder}} object offers no label argument as it only supports UTF-8. It also offers no stream option as no encoder @@ -1436,7 +1439,8 @@ requires buffering of scalar values.

    Returns a writable stream which accepts string chunks and runs them through UTF-8's encoder before making them available to {{TextEncoder/readable}}. -

    Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a {{ReadableStream}} source. +

    Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a + {{ReadableStream}} source.

    
    @@ -1464,8 +1468,8 @@ constructor, when invoked, must run these steps:
      
  • Let flushAlgorithm be an algorithm which runs the encode and flush algorithm with enc. -

  • Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, +

  • Let transform be the result of calling + CreateTransformStream(startAlgorithm, transformAlgorithm, flushAlgorithm).

  • Set enc's transform to transform @@ -1476,11 +1480,11 @@ constructor, when invoked, must run these steps:

    The encoding attribute's getter must return "utf-8". -

    The readable attribute's getter must return transform.\[[readable]]. +

    The readable attribute's getter must return +transform.\[[readable]]. -

    The writable attribute's getter must return transform.\[[writable]]. +

    The writable attribute's getter must return +transform.\[[writable]].

    The encode(input) method, when invoked, must run these steps: @@ -1496,8 +1500,8 @@ must run these steps:

  • If IsWritableStreamLocked(writable) is true, then throw a {{TypeError}} exception. -

    These steps are for consistent behaviour with the {{TextDecoder}} decode(input) method. +

    These steps are for consistent behaviour with the {{TextDecoder}} + decode(input) method.

  • Convert input to a stream. @@ -1538,8 +1542,8 @@ must run these steps:

  • Let encoder be UTF-8's encoder. -

  • Let controller be enc's transform.\[[transformStreamController]]. +

  • Let controller be enc's + transform.\[[transformStreamController]].

  • While true, run these steps: @@ -1616,12 +1620,13 @@ strings.

  • Let encoder be UTF-8's encoder. -

  • Let controller be enc's transform.\[[transformStreamController]]. +

  • Let controller be enc's + transform.\[[transformStreamController]].

  • Let token be U+FFFD. -

  • Process token for encoder, input, output. +

  • Process token for encoder, input, + output.

  • Convert output into a byte sequence. From 9c96008fb664c67421e343bc16149f68f91b54b2 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 26 Jan 2018 00:16:24 +0900 Subject: [PATCH 05/17] Remove unnecessary blank lines and add


    s Add
    s before the streaming-related algorithms to set them apart from the method algorithms. Remove some blank lines that shouldn't be there. --- encoding.bs | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/encoding.bs b/encoding.bs index e0195c2..c615a58 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1314,6 +1314,8 @@ method, when invoked, must run these steps: +
    +

    The decode and enqueue a chunk algorithm, given a {{TextDecoder}} dec and a chunk, runs these steps: @@ -1528,6 +1530,8 @@ must run these steps: +


    +

    The encode and enqueue a chunk algorithm, given a {{TextEncoder}} enc and chunk, runs these steps: @@ -1570,7 +1574,6 @@ must run these steps: -

    The convert code unit to scalar value algorithm, given a {{TextEncoder}} enc, token and input stream, runs these steps: @@ -1604,7 +1607,6 @@ runs these steps: scalar values algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between strings. -

    The encode and flush algorithm, given a {{TextEncoder}} enc, runs these steps: From 1aaa4279cc6f02196f8900535c92e4b4a9073950 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 26 Jan 2018 19:03:56 +0900 Subject: [PATCH 06/17] Add missing "then" and fix spelling of behavior --- encoding.bs | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/encoding.bs b/encoding.bs index c615a58..37bb44e 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1325,7 +1325,7 @@ method, when invoked, must run these steps: true, or IsSharedArrayBuffer(chunk) is true, then return a new promise rejected with a {{TypeError}} exception. -

  • If the do not flush flag for dec is unset, set +

  • If the do not flush flag for dec is unset, then set dec's decoder to a new decoder for dec's encoding, set dec's stream to a new stream, and unset dec's BOM seen flag. @@ -1363,7 +1363,7 @@ method, when invoked, must run these steps: dec's decoder, dec's stream, output, and dec's error mode. -

  • If result is error, return a new promise rejected with a +

  • If result is error, then return a new promise rejected with a {{TypeError}} exception. @@ -1372,7 +1372,7 @@ method, when invoked, must run these steps: of data from the input {{ReadableStream}}, given a {{TextDecoder}} dec, runs these steps:

      -
    1. If the do not flush flag for dec is unset, set +

    2. If the do not flush flag for dec is unset, then set dec's decoder to a new decoder for dec's encoding, set dec's stream to a new stream, and unset dec's BOM seen flag. @@ -1502,7 +1502,7 @@ must run these steps:

    3. If IsWritableStreamLocked(writable) is true, then throw a {{TypeError}} exception. -

      These steps are for consistent behaviour with the {{TextDecoder}} +

      These steps are for consistent behavior with the {{TextDecoder}} decode(input) method.

    4. Convert input to a stream. @@ -1537,8 +1537,8 @@ must run these steps:

      1. Let input be the result of converting - chunk to a {{DOMString}}. If this throws an exception, return a promise rejected with - that exception. + chunk to a {{DOMString}}. If this throws an exception, then return a promise rejected + with that exception.

      2. Convert input to a stream. @@ -1568,7 +1568,7 @@ must run these steps:

      3. Let result be the result of executing the convert code unit to scalar value algorithm with enc, token and input. -

      4. If result is not continue, process result for +

      5. If result is not continue, then process result for encoder, input, output.

      @@ -1586,19 +1586,19 @@ runs these steps:
    5. Unset enc's pending high surrogate. -

    6. If token is in the range U+DC00 to U+DFFF, inclusive, return a code point whose - value is 0x10000 + ((high surrogate − 0xD800) << 10) + (token - − 0xDC00). +

    7. If token is in the range U+DC00 to U+DFFF, inclusive, then return a code point + whose value is 0x10000 + ((high surrogate − 0xD800) << 10) + + (token − 0xDC00).

    8. Prepend token to input.

    9. Return U+FFFD.

    -
  • If token is in the range U+D800 to U+DBFF, inclusive, set pending high +

  • If token is in the range U+D800 to U+DBFF, inclusive, then set pending high surrogate to token and return continue. -

  • If token is in the range U+DC00 to U+DFFF, inclusive, return U+FFFD. +

  • If token is in the range U+DC00 to U+DFFF, inclusive, then return U+FFFD.

  • Return token. From 981d360b430dcc25416579fc106c77154bb4217f Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 26 Jan 2018 19:13:28 +0900 Subject: [PATCH 07/17] Fix indentation of list items with multiple children --- encoding.bs | 84 ++++++++++++++++++++++++++++------------------------- 1 file changed, 44 insertions(+), 40 deletions(-) diff --git a/encoding.bs b/encoding.bs index 37bb44e..6b26339 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1549,29 +1549,31 @@ must run these steps:

  • Let controller be enc's transform.\[[transformStreamController]]. -

  • While true, run these steps: +

  • +

    While true, run these steps: -

      -
    1. Let token be the result of reading from input. +

        +
      1. Let token be the result of reading from input. -

      2. If token is end-of-stream, then run these steps: +

      3. +

        If token is end-of-stream, then run these steps: -

          -
        1. Convert output into a byte sequence. +

            +
          1. Convert output into a byte sequence. -

          2. Call TransformStreamDefaultControllerEnqueue(controller, - output). +

          3. Call TransformStreamDefaultControllerEnqueue(controller, + output). -

          4. Return a new promise resolved with undefined. -

          +
        2. Return a new promise resolved with undefined. +

        -
      4. Let result be the result of executing the convert code unit to scalar - value algorithm with enc, token and input. +

      5. Let result be the result of executing the convert code unit to scalar + value algorithm with enc, token and input. -

      6. If result is not continue, then process result for - encoder, input, output. +

      7. If result is not continue, then process result for + encoder, input, output. -

      +

    The convert code unit to scalar value @@ -1579,21 +1581,22 @@ algorithm, given a {{TextEncoder}} enc, token and inp runs these steps:

      -
    1. If enc's pending high surrogate is set, run these steps: +

    2. +

      If enc's pending high surrogate is set, run these steps: -

        -
      1. Let high surrogate be enc's pending high surrogate. +

          +
        1. Let high surrogate be enc's pending high surrogate. -

        2. Unset enc's pending high surrogate. +

        3. Unset enc's pending high surrogate. -

        4. If token is in the range U+DC00 to U+DFFF, inclusive, then return a code point - whose value is 0x10000 + ((high surrogate − 0xD800) << 10) + - (token − 0xDC00). +

        5. If token is in the range U+DC00 to U+DFFF, inclusive, then return a code point + whose value is 0x10000 + ((high surrogate − 0xD800) << 10) + + (token − 0xDC00). -

        6. Prepend token to input. +

        7. Prepend token to input. -

        8. Return U+FFFD. -

        +
      2. Return U+FFFD. +

    3. If token is in the range U+D800 to U+DBFF, inclusive, then set pending high surrogate to token and return continue. @@ -1611,30 +1614,31 @@ strings. {{TextEncoder}} enc, runs these steps:

        -
      1. If enc's pending high surrogate is set, run these steps: +

      2. +

        If enc's pending high surrogate is set, run these steps: -

          -
        1. Unset enc's pending high surrogate +

            +
          1. Unset enc's pending high surrogate -

          2. Let input be a new stream. +

          3. Let input be a new stream. -

          4. Let output be a new stream. +

          5. Let output be a new stream. -

          6. Let encoder be UTF-8's encoder. +

          7. Let encoder be UTF-8's encoder. -

          8. Let controller be enc's - transform.\[[transformStreamController]]. +

          9. Let controller be enc's + transform.\[[transformStreamController]]. -

          10. Let token be U+FFFD. +

          11. Let token be U+FFFD. -

          12. Process token for encoder, input, - output. +

          13. Process token for encoder, input, + output. -

          14. Convert output into a byte sequence. +

          15. Convert output into a byte sequence. -

          16. Call TransformStreamDefaultControllerEnqueue(controller, - output). -

          +
        2. Call TransformStreamDefaultControllerEnqueue(controller, + output). +

      3. Return a new promise resolved with undefined.

      From 99d98370701ea28416ca8648a3d0d2b292876abb Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 26 Jan 2018 19:19:56 +0900 Subject: [PATCH 08/17] Add "then" to "If ..., run these steps" lines --- encoding.bs | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/encoding.bs b/encoding.bs index 6b26339..22214f4 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1348,7 +1348,7 @@ method, when invoked, must run these steps: stream.
    4. -

      If token is end-of-stream, run these steps: +

      If token is end-of-stream, then run these steps:

      1. Let outputChunk be output, serialized. @@ -1385,7 +1385,7 @@ of data from the input {{ReadableStream}}, given a {{TextDecoder}} decdec's decoder and dec's stream, output, and dec's error mode. -

      2. If result is finished, run these steps: +

      3. If result is finished, then run these steps:

        1. Let outputChunk be output, serialized. @@ -1582,7 +1582,7 @@ runs these steps:

          1. -

            If enc's pending high surrogate is set, run these steps: +

            If enc's pending high surrogate is set, then run these steps:

            1. Let high surrogate be enc's pending high surrogate. @@ -1615,7 +1615,7 @@ strings.

              1. -

                If enc's pending high surrogate is set, run these steps: +

                If enc's pending high surrogate is set, then run these steps:

                1. Unset enc's pending high surrogate From b498ac08f1c3a8d608a2a858e8529f1cf8b90278 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 26 Jan 2018 19:28:37 +0900 Subject: [PATCH 09/17] Remove parenthesis about type of "transform" slots --- encoding.bs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/encoding.bs b/encoding.bs index 22214f4..36be75c 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1089,7 +1089,7 @@ interface TextDecoder { BOM seen flag (initially unset), error mode (initially "replacement"), do not flush flag (initially unset), and -transform (a {{TransformStream}} object). +transform.

                  A {{TextDecoder}} object also has an associated serialize stream algorithm, that given a @@ -1415,8 +1415,8 @@ interface TextEncoder { };

  • A {{TextEncoder}} object has an associated encoder, -transform (a {{TransformStream}} object) and -pending high surrogate (initially unset). +transform and pending high surrogate +(initially unset).

    A {{TextEncoder}} object offers no label argument as it only supports UTF-8. It also offers no stream option as no encoder From f8e1249d8618d0045c529ddddd048d7a59664af9 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 26 Jan 2018 19:56:49 +0900 Subject: [PATCH 10/17] Change what the "convert code unit to sv" algo is equivalent to It was noted as being equivalent to the algorithm in WebIDL. Instead note that it is equivalent to the "convert a JavaScript string into a scalar value string" algorithm from INFRA. Also add a trailing full-stop. --- encoding.bs | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/encoding.bs b/encoding.bs index 36be75c..d31c2ae 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1474,7 +1474,7 @@ constructor, when invoked, must run these steps: CreateTransformStream(startAlgorithm, transformAlgorithm, flushAlgorithm). -

  • Set enc's transform to transform +

  • Set enc's transform to transform.

  • Return enc. @@ -1606,9 +1606,9 @@ runs these steps:

  • Return token. -

    This is equivalent to the convert a DOMString to a sequence of Unicode -scalar values algorithm from [[WEBIDL]], but allows for surrogate pairs that are split between -strings. +

    This is equivalent to the "convert a JavaScript string into a scalar +value string" algorithm from the Infra Standard, but allows for surrogate pairs that are split +between strings.

    The encode and flush algorithm, given a {{TextEncoder}} enc, runs these steps: From 57d5e1c514e4eb4edb7d9c742f1c1778b33a57f5 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 26 Jan 2018 20:02:49 +0900 Subject: [PATCH 11/17] Change the type of "pending high surrogate" TextEncoder's "pending high surrogate" was either a code unit or unset. Make it "null or a code unit" instead. --- encoding.bs | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/encoding.bs b/encoding.bs index d31c2ae..058b3e2 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1416,7 +1416,7 @@ interface TextEncoder {

    A {{TextEncoder}} object has an associated encoder, transform and pending high surrogate -(initially unset). +(initially null).

    A {{TextEncoder}} object offers no label argument as it only supports UTF-8. It also offers no stream option as no encoder @@ -1582,12 +1582,12 @@ runs these steps:

    1. -

      If enc's pending high surrogate is set, then run these steps: +

      If enc's pending high surrogate is non-null, then run these steps:

      1. Let high surrogate be enc's pending high surrogate. -

      2. Unset enc's pending high surrogate. +

      3. Set enc's pending high surrogate to null.

      4. If token is in the range U+DC00 to U+DFFF, inclusive, then return a code point whose value is 0x10000 + ((high surrogate − 0xD800) << 10) + @@ -1615,10 +1615,10 @@ between strings.

        1. -

          If enc's pending high surrogate is set, then run these steps: +

          If enc's pending high surrogate is non-null, then run these steps:

            -
          1. Unset enc's pending high surrogate +

          2. Set enc's pending high surrogate to null.

          3. Let input be a new stream. From 02fc31dbd9779083212e932ec9f61f8e9f16a3a9 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Thu, 22 Mar 2018 23:07:36 +0900 Subject: [PATCH 12/17] Change TextDecoder to use separate state for streams A new TextDecoder variable, "decForTransform" is introduced to the constructor to achieve this. --- encoding.bs | 76 ++++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 39 deletions(-) diff --git a/encoding.bs b/encoding.bs index 058b3e2..94989dc 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1214,13 +1214,35 @@ constructor, when invoked, must run these steps:

          4. If options's ignoreBOM member is true, then set dec's ignore BOM flag. +

          5. Let decForTransform be a new {{TextDecoder}} object. + +

          6. Set decForTransform's encoding to encoding. + +

          7. Set decForTransform's error mode to dec's + error mode. + +

          8. Set decForTransform's ignore BOM flag to dec's + ignore BOM flag. + +

          9. Set decForTransform's decoder to a new decoder for decForTransform's encoding, and set + decForTransform's stream to a new stream. + +

            For simplicity, dec and decForTransform have redundant + members. However, the BOM seen flag, do not flush + flag, and transform are unused in decForTransform, and encoding, ignore BOM flag and error + mode are identical to dec. It is not necessary for implementations to duplicate + these member fields.

            +
          10. Let startAlgorithm be an algorithm that takes no arguments and returns nothing.

          11. Let transformAlgorithm be an algorithm which takes a chunk argument - and runs the decode and enqueue a chunk algorithm with dec and chunk. + and runs the decode and enqueue a chunk algorithm with decForTransform and + chunk.

          12. Let flushAlgorithm be an algorithm which takes no arguments and runs the flush - and enqueue algorithm with dec. + and enqueue algorithm with decForTransform.

          13. Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, @@ -1250,19 +1272,6 @@ true if ignore BOM flag is set, and false otherwise. method, when invoked, must run these steps:

              -
            1. Let readable be transform.\[[readable]]. - -

            2. If IsReadableStreamLocked(readable) is true, then throw a - {{TypeError}} exception. - -

            3. Let writable be transform.\[[writable]]. - -

            4. If IsWritableStreamLocked(writable) is true, then throw a - {{TypeError}} exception. - -

              These steps ensure that the state of the decoder is not simultaneously - modified by the Streams API and this method. -

            5. If the do not flush flag is unset, set decoder to a new encoding's decoder, set stream to a new stream, and unset the BOM seen flag. @@ -1317,7 +1326,7 @@ method, when invoked, must run these steps:


              The decode and enqueue a chunk algorithm, given a -{{TextDecoder}} dec and a chunk, runs these steps: +{{TextDecoder}} decForTransform and a chunk, runs these steps:

              1. If Type(chunk) is not Object, or chunk does not have an @@ -1325,17 +1334,10 @@ method, when invoked, must run these steps: true, or IsSharedArrayBuffer(chunk) is true, then return a new promise rejected with a {{TypeError}} exception. -

              2. If the do not flush flag for dec is unset, then set - dec's decoder to a new decoder for dec's - encoding, set dec's stream to a new - stream, and unset dec's BOM seen flag. - -

              3. Set dec's do not flush flag. -

              4. Push a copy of chunk to - dec's stream. + decForTransform's stream. -

              5. Let controller be dec's +

              6. Let controller be decForTransform's transform.\[[transformStreamController]].

              7. Let output be a new stream. @@ -1344,7 +1346,7 @@ method, when invoked, must run these steps:

                While true, run these steps:

                  -
                1. Let token be the result of reading from dec's +

                2. Let token be the result of reading from decForTransform's stream.

                3. @@ -1360,8 +1362,9 @@ method, when invoked, must run these steps:
              8. Let result be the result of processing token for - dec's decoder, dec's stream, - output, and dec's error mode. + decForTransform's decoder, decForTransform's stream, output, and decForTransform's error mode.

              9. If result is error, then return a new promise rejected with a {{TypeError}} exception. @@ -1369,27 +1372,22 @@ method, when invoked, must run these steps:

              The flush and enqueue algorithm, which handles the end -of data from the input {{ReadableStream}}, given a {{TextDecoder}} dec, runs these steps: +of data from the input {{ReadableStream}}, given a {{TextDecoder}} decForTransform, runs +these steps:

                -
              1. If the do not flush flag for dec is unset, then set - dec's decoder to a new decoder for dec's - encoding, set dec's stream to a new - stream, and unset dec's BOM seen flag. - -

              2. Unset dec's do not flush flag. -

              3. Let output be a new stream.

              4. Let result be the result of processing end-of-stream for - dec's decoder and dec's stream, - output, and dec's error mode. + decForTransform's decoder and decForTransform's stream, output, and decForTransform's error mode.

              5. If result is finished, then run these steps:

                1. Let outputChunk be output, serialized. -

                2. Let controller be dec's +

                3. Let controller be decForTransform's transform.\[[transformStreamController]].

                4. Call TransformStreamDefaultControllerEnqueue(controller, From fc6f74510a32ca12efae50aaa531ebc8477663a0 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Thu, 22 Mar 2018 23:57:12 +0900 Subject: [PATCH 13/17] Change TextDecoder to use independent state for streams Create a separate TextDecoder object "encForTransform" in the constructor, and use that to initialise "transform". Also clean up some wrapping problems in TextEncoder. --- encoding.bs | 103 ++++++++++++++++++++++++++-------------------------- 1 file changed, 51 insertions(+), 52 deletions(-) diff --git a/encoding.bs b/encoding.bs index 94989dc..e8f144e 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1224,16 +1224,16 @@ constructor, when invoked, must run these steps:

                5. Set decForTransform's ignore BOM flag to dec's ignore BOM flag. -

                6. Set decForTransform's decoder to a new decoder for decForTransform's encoding, and set +

                7. Set decForTransform's decoder to a new + decoder for decForTransform's encoding, and set decForTransform's stream to a new stream.

                  For simplicity, dec and decForTransform have redundant members. However, the BOM seen flag, do not flush - flag, and transform are unused in decForTransform, and encoding, ignore BOM flag and error - mode are identical to dec. It is not necessary for implementations to duplicate - these member fields.

                  + flag, and transform are unused in decForTransform, and + encoding, ignore BOM flag and + error mode are identical to dec. It is not necessary for + implementations to duplicate these member fields.

                8. Let startAlgorithm be an algorithm that takes no arguments and returns nothing. @@ -1362,9 +1362,9 @@ method, when invoked, must run these steps:

              6. Let result be the result of processing token for - decForTransform's decoder, decForTransform's stream, output, and decForTransform's error mode. + decForTransform's decoder, decForTransform's + stream, output, and decForTransform's + error mode.

              7. If result is error, then return a new promise rejected with a {{TypeError}} exception. @@ -1379,9 +1379,9 @@ these steps:

              8. Let output be a new stream.

              9. Let result be the result of processing end-of-stream for - decForTransform's decoder and decForTransform's stream, output, and decForTransform's error mode. + decForTransform's decoder and decForTransform's + stream, output, and decForTransform's + error mode.

              10. If result is finished, then run these steps:

                  @@ -1460,13 +1460,24 @@ constructor, when invoked, must run these steps:
                1. Set enc's encoder to UTF-8's encoder. +

                2. Let encForTransform be a new {{TextEncoder}} object. + +

                3. Set encForTransform's encoder to UTF-8's + encoder. + +

                  For simplicity, enc and encForTransform have the same + members. However, transform is not used by encForTransform + and pending high surrogate is not used by enc. It is not + necessary for implementations to store these unused member fields.

                  +
                4. Let startAlgorithm be an algorithm that takes no arguments and returns nothing.

                5. Let transformAlgorithm be an algorithm which takes a chunk argument - and runs the encode and enqueue a chunk algorithm with enc and chunk. + and runs the encode and enqueue a chunk algorithm with encForTransform and + chunk.

                6. Let flushAlgorithm be an algorithm which runs the encode and flush - algorithm with enc. + algorithm with encForTransform.

                7. Let transform be the result of calling CreateTransformStream(startAlgorithm, transformAlgorithm, @@ -1490,19 +1501,6 @@ constructor, when invoked, must run these steps: must run these steps:

                    -
                  1. Let readable be transform.\[[readable]]. - -

                  2. If IsReadableStreamLocked(readable) is true, then throw a - {{TypeError}} exception. - -

                  3. Let writable be transform.\[[writable]]. - -

                  4. If IsWritableStreamLocked(writable) is true, then throw a - {{TypeError}} exception. - -

                    These steps are for consistent behavior with the {{TextDecoder}} - decode(input) method. -

                  5. Convert input to a stream.

                  6. Let output be a new stream. @@ -1531,7 +1529,7 @@ must run these steps:


                    The encode and enqueue a chunk algorithm, given a -{{TextEncoder}} enc and chunk, runs these steps: +{{TextEncoder}} encForTransform and chunk, runs these steps:

                    1. Let input be the result of converting @@ -1542,9 +1540,7 @@ must run these steps:

                    2. Let output be a new stream. -

                    3. Let encoder be UTF-8's encoder. - -

                    4. Let controller be enc's +

                    5. Let controller be encForTransform's transform.\[[transformStreamController]].

                    6. @@ -1559,33 +1555,38 @@ must run these steps:
                      1. Convert output into a byte sequence. +

                      2. Let chunk be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing + output. +

                      3. Call TransformStreamDefaultControllerEnqueue(controller, - output). + chunk).

                      4. Return a new promise resolved with undefined.

                    7. Let result be the result of executing the convert code unit to scalar - value algorithm with enc, token and input. + value algorithm with encForTransform, token and input.

                    8. If result is not continue, then process result for - encoder, input, output. + encoder, input, output.

                  The convert code unit to scalar value -algorithm, given a {{TextEncoder}} enc, token and input stream, -runs these steps: +algorithm, given a {{TextEncoder}} encForTransform, token and input +stream, runs these steps:

                  1. -

                    If enc's pending high surrogate is non-null, then run these steps: +

                    If encForTransform's pending high surrogate is non-null, then run these + steps:

                      -
                    1. Let high surrogate be enc's pending high surrogate. +

                    2. Let high surrogate be encForTransform's pending high + surrogate. -

                    3. Set enc's pending high surrogate to null. +

                    4. Set encForTransform's pending high surrogate to null.

                    5. If token is in the range U+DC00 to U+DFFF, inclusive, then return a code point whose value is 0x10000 + ((high surrogate − 0xD800) << 10) + @@ -1609,33 +1610,31 @@ value string" algorithm from the Infra Standard, but allows for surrogate pa between strings.

                      The encode and flush algorithm, given a -{{TextEncoder}} enc, runs these steps: +{{TextEncoder}} encForTransform, runs these steps:

                      1. -

                        If enc's pending high surrogate is non-null, then run these steps: +

                        If encForTransform's pending high surrogate is non-null, then run these + steps:

                          -
                        1. Set enc's pending high surrogate to null. - -

                        2. Let input be a new stream. - -

                        3. Let output be a new stream. - -

                        4. Let encoder be UTF-8's encoder. - -

                        5. Let controller be enc's +

                        6. Let controller be encForTransform's transform.\[[transformStreamController]].

                        7. Let token be U+FFFD. -

                        8. Process token for encoder, input, +

                        9. Let input and output be new streams. + +

                        10. Process token for encoder, input, output.

                        11. Convert output into a byte sequence. +

                        12. Let chunk be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing + output. +

                        13. Call TransformStreamDefaultControllerEnqueue(controller, - output). + chunk).

                      2. Return a new promise resolved with undefined. From a734143fe256923e4b2f1cb4fe1d432fa3ff5b50 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 23 Mar 2018 00:09:48 +0900 Subject: [PATCH 14/17] Pre-encode the replacement character in encode and flush No point in running the encoder when the answer is always the same. Also fix markup indentation for notes. --- encoding.bs | 42 ++++++++++++++++++++---------------------- 1 file changed, 20 insertions(+), 22 deletions(-) diff --git a/encoding.bs b/encoding.bs index e8f144e..28e2ce2 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1224,16 +1224,17 @@ constructor, when invoked, must run these steps:

                      3. Set decForTransform's ignore BOM flag to dec's ignore BOM flag. -

                      4. Set decForTransform's decoder to a new - decoder for decForTransform's encoding, and set - decForTransform's stream to a new stream. +

                      5. +

                        Set decForTransform's decoder to a new decoder + for decForTransform's encoding, and set + decForTransform's stream to a new stream. -

                        For simplicity, dec and decForTransform have redundant - members. However, the BOM seen flag, do not flush - flag, and transform are unused in decForTransform, and - encoding, ignore BOM flag and - error mode are identical to dec. It is not necessary for - implementations to duplicate these member fields.

                        +

                        For simplicity, dec and decForTransform have redundant + members. However, the BOM seen flag, do not flush + flag, and transform are unused in decForTransform, and + encoding, ignore BOM flag and + error mode are identical to dec. It is not necessary for + implementations to duplicate these member fields.

                      6. Let startAlgorithm be an algorithm that takes no arguments and returns nothing. @@ -1462,13 +1463,14 @@ constructor, when invoked, must run these steps:

                      7. Let encForTransform be a new {{TextEncoder}} object. -

                      8. Set encForTransform's encoder to UTF-8's - encoder. +

                      9. +

                        Set encForTransform's encoder to UTF-8's + encoder. -

                        For simplicity, enc and encForTransform have the same - members. However, transform is not used by encForTransform - and pending high surrogate is not used by enc. It is not - necessary for implementations to store these unused member fields.

                        +

                        For simplicity, enc and encForTransform have the same + members. However, transform is not used by encForTransform and + pending high surrogate is not used by enc. It is not necessary + for implementations to store these unused member fields.

                      10. Let startAlgorithm be an algorithm that takes no arguments and returns nothing. @@ -1621,14 +1623,10 @@ between strings.

                      11. Let controller be encForTransform's transform.\[[transformStreamController]]. -

                      12. Let token be U+FFFD. - -

                      13. Let input and output be new streams. - -

                      14. Process token for encoder, input, - output. +

                      15. +

                        Let output be the byte sequence 0xEF 0xBF 0xBD. -

                      16. Convert output into a byte sequence. +

                        This is the replacement character U+FFFD encoded as UTF-8.

                      17. Let chunk be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing output. From 64dbb75eb7851da0596c75852020029bf2ca4350 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Mon, 26 Mar 2018 16:17:44 +0900 Subject: [PATCH 15/17] Add note about why DOMString is used for streaming encoding --- encoding.bs | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/encoding.bs b/encoding.bs index 28e2ce2..baf48ac 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1538,6 +1538,11 @@ must run these steps: chunk to a {{DOMString}}. If this throws an exception, then return a promise rejected with that exception. +

                        {{DOMString}} is used here so that a surrogate pair that is split between chunks can + be correctly reassembled into the appropriate code point in the output. The behaviour is otherwise + identical to {{USVString}}. In particular, replacement characters will be used where necessary to + make the output valid UTF-8. +

                      18. Convert input to a stream.

                      19. Let output be a new stream. From 16ba8556ac615830aa902c96422b4c49ba1ae591 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Mon, 26 Mar 2018 18:53:08 +0900 Subject: [PATCH 16/17] Rephrase behaviour of transform on ill-formed UTF-16 in note --- encoding.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/encoding.bs b/encoding.bs index baf48ac..f3bcde1 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1540,8 +1540,8 @@ must run these steps:

                        {{DOMString}} is used here so that a surrogate pair that is split between chunks can be correctly reassembled into the appropriate code point in the output. The behaviour is otherwise - identical to {{USVString}}. In particular, replacement characters will be used where necessary to - make the output valid UTF-8. + identical to {{USVString}}. In particular, lone surrogates will be replaced with U+FFFD so that the + output is always well-formed UTF-8.

                      20. Convert input to a stream. From c39d7a1a7c5cac27f78982888dd5abdb43191e38 Mon Sep 17 00:00:00 2001 From: Adam Rice Date: Fri, 30 Mar 2018 19:27:10 +0900 Subject: [PATCH 17/17] Update explanation about use of DOMString --- encoding.bs | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/encoding.bs b/encoding.bs index f3bcde1..fb8e391 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1539,9 +1539,8 @@ must run these steps: with that exception.

                        {{DOMString}} is used here so that a surrogate pair that is split between chunks can - be correctly reassembled into the appropriate code point in the output. The behaviour is otherwise - identical to {{USVString}}. In particular, lone surrogates will be replaced with U+FFFD so that the - output is always well-formed UTF-8. + be reassembled into the appropriate scalar value. The behavior is otherwise + identical to {{USVString}}. In particular, lone surrogates will be replaced with U+FFFD.

                      21. Convert input to a stream.