-
-
Notifications
You must be signed in to change notification settings - Fork 117
feat(stdlib): Add Buffer.setChar
and Buffer.getChar
#2262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ from "char" include Char | |
from "runtime/numbers" include Numbers | ||
use Numbers.{ coerceNumberToWasmI32 } | ||
from "runtime/utf8" include Utf8 | ||
use Utf8.{ usvEncodeLength } | ||
use Utf8.{ usvEncodeLength, utf8ByteCount, exception MalformedUnicode } | ||
from "runtime/unsafe/offsets" include Offsets | ||
use Offsets.{ _BYTES_LEN_OFFSET, _BYTES_DATA_OFFSET } | ||
|
||
|
@@ -376,6 +376,68 @@ provide let addString = (string, buffer) => { | |
buffer.len += bytelen | ||
} | ||
|
||
/** | ||
* Gets the UTF-8 encoded character at the given byte index. | ||
* | ||
* @param index: The byte index to access | ||
* @param buffer: The buffer to access | ||
* @returns A character starting at the given index | ||
* | ||
* @throws IndexOutOfBounds: When `index` is negative | ||
* @throws IndexOutOfBounds: When `index + 1` is greater than the bytes size | ||
* @throws MalformedUnicode: When the requested character is not a valid UTF-8 sequence | ||
* | ||
* @example | ||
* let buf = Buffer.make(32) | ||
* Buffer.addString("Hello World 🌾", buf) | ||
* assert Buffer.getChar(12, buf) == 'H' | ||
spotandjake marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* | ||
* @since v0.7.0 | ||
*/ | ||
@unsafe | ||
provide let getChar = (index, buffer) => { | ||
use WasmI32.{ (+), (&), (+), (==), (>) } | ||
checkIsIndexInBounds(index, 1, buffer) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this 1 if it's UTF-8? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Characters can be between This is why |
||
// Note: We do a raw check as we need the byte length before reading the full char | ||
let bytes = buffer.data | ||
let ptr = WasmI32.fromGrain(bytes) | ||
let offset = coerceNumberToWasmI32(index) | ||
let byte = WasmI32.load8U(ptr + offset, _BYTES_DATA_OFFSET) | ||
let charSize = utf8ByteCount(byte) | ||
if (offset + charSize > coerceNumberToWasmI32(buffer.len)) { | ||
throw MalformedUnicode | ||
} | ||
ignore(bytes) | ||
Bytes.getChar(index, bytes) | ||
} | ||
|
||
/** | ||
* UTF-8 encodes a character starting at the given byte index. | ||
* | ||
* @param index: The byte index to update | ||
* @param char: The value to set | ||
* @param buffer: The buffer to mutate | ||
* | ||
* @throws IndexOutOfBounds: When `index` is negative | ||
* @throws IndexOutOfBounds: When `index` is greater than or equal to the buffer size | ||
* @throws IndexOutOfBounds: When `index + charSize` is greater than the bytes size, `charSize` is the number of bytes in the character ranging from 1 to 4 | ||
* | ||
* @example | ||
* let buf = Buffer.make(32) | ||
* Buffer.addString("Hello World.", buf) | ||
* Buffer.setChar(11, '!', buf) | ||
* assert Buffer.toString(buf) == "Hello World!" | ||
* | ||
* @since v0.7.0 | ||
*/ | ||
@unsafe | ||
provide let setChar = (index, char, buffer) => { | ||
let usv = untagChar(char) | ||
let byteCount = tagSimpleNumber(usvEncodeLength(usv)) | ||
checkIsIndexInBounds(index, byteCount, buffer) | ||
Bytes.setChar(index, char, buffer.data) | ||
} | ||
|
||
/** | ||
* Appends the bytes of a character to a buffer. | ||
* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the operation is intended to get a character starting at a byte index then if you point your index in the middle of a UTF-8 character then what would the expectation be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it would either return a character in the case that the rest was a valid char (I can't recall if thats ever the case), but more likely
MalformedUnicode
.