Skip to content

feat(stdlib): Add Buffer.setChar and Buffer.getChar #2262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions compiler/test/stdlib/buffer.test.gr
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,27 @@ let b = Buffer.make(0)
Buffer.addString(str, b)
assert Buffer.toBytes(a) == Buffer.toBytes(b)

// Bytes.getChar
let bytes = Buffer.make(32)
Buffer.addString("ab©✨🍞", bytes)
assert Buffer.getChar(0, bytes) == 'a'
assert Buffer.getChar(1, bytes) == 'b'
assert Buffer.getChar(2, bytes) == '©'
assert Buffer.getChar(4, bytes) == '✨'
assert Buffer.getChar(7, bytes) == '🍞'

// Bytes.setChar
let bytes = Buffer.make(16)
Buffer.addBytes(b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", bytes)
Buffer.setChar(0, 'a', bytes)
assert Buffer.getChar(0, bytes) == 'a'
Buffer.setChar(1, '©', bytes)
assert Buffer.getChar(1, bytes) == '©'
Buffer.setChar(3, '✨', bytes)
assert Buffer.getChar(3, bytes) == '✨'
Buffer.setChar(7, '🍞', bytes)
assert Buffer.getChar(7, bytes) == '🍞'

// addChar
let char = 'a' // 1 byte
let buf = Buffer.make(0)
Expand Down
64 changes: 63 additions & 1 deletion stdlib/buffer.gr
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ from "char" include Char
from "runtime/numbers" include Numbers
use Numbers.{ coerceNumberToWasmI32 }
from "runtime/utf8" include Utf8
use Utf8.{ usvEncodeLength }
use Utf8.{ usvEncodeLength, utf8ByteCount, exception MalformedUnicode }
from "runtime/unsafe/offsets" include Offsets
use Offsets.{ _BYTES_LEN_OFFSET, _BYTES_DATA_OFFSET }

Expand Down Expand Up @@ -376,6 +376,68 @@ provide let addString = (string, buffer) => {
buffer.len += bytelen
}

/**
* Gets the UTF-8 encoded character at the given byte index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the operation is intended to get a character starting at a byte index then if you point your index in the middle of a UTF-8 character then what would the expectation be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it would either return a character in the case that the rest was a valid char (I can't recall if thats ever the case), but more likely MalformedUnicode.

*
* @param index: The byte index to access
* @param buffer: The buffer to access
* @returns A character starting at the given index
*
* @throws IndexOutOfBounds: When `index` is negative
* @throws IndexOutOfBounds: When `index + 1` is greater than the bytes size
* @throws MalformedUnicode: When the requested character is not a valid UTF-8 sequence
*
* @example
* let buf = Buffer.make(32)
* Buffer.addString("Hello World 🌾", buf)
* assert Buffer.getChar(12, buf) == 'H'
*
* @since v0.7.0
*/
@unsafe
provide let getChar = (index, buffer) => {
use WasmI32.{ (+), (&), (+), (==), (>) }
checkIsIndexInBounds(index, 1, buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this 1 if it's UTF-8?

Copy link
Member Author

@spotandjake spotandjake Apr 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Characters can be between 1 and 4 bytes, we need to ensure the first byte exists so we can check the char size on line 406 and we do an additional length check on line 408 with the actual length.

This is why getChar needs to operate on the bytes directly rather than just using Bytes.getChar like our other helpers.

// Note: We do a raw check as we need the byte length before reading the full char
let bytes = buffer.data
let ptr = WasmI32.fromGrain(bytes)
let offset = coerceNumberToWasmI32(index)
let byte = WasmI32.load8U(ptr + offset, _BYTES_DATA_OFFSET)
let charSize = utf8ByteCount(byte)
if (offset + charSize > coerceNumberToWasmI32(buffer.len)) {
throw MalformedUnicode
}
ignore(bytes)
Bytes.getChar(index, bytes)
}

/**
* UTF-8 encodes a character starting at the given byte index.
*
* @param index: The byte index to update
* @param char: The value to set
* @param buffer: The buffer to mutate
*
* @throws IndexOutOfBounds: When `index` is negative
* @throws IndexOutOfBounds: When `index` is greater than or equal to the buffer size
* @throws IndexOutOfBounds: When `index + charSize` is greater than the bytes size, `charSize` is the number of bytes in the character ranging from 1 to 4
*
* @example
* let buf = Buffer.make(32)
* Buffer.addString("Hello World.", buf)
* Buffer.setChar(11, '!', buf)
* assert Buffer.toString(buf) == "Hello World!"
*
* @since v0.7.0
*/
@unsafe
provide let setChar = (index, char, buffer) => {
let usv = untagChar(char)
let byteCount = tagSimpleNumber(usvEncodeLength(usv))
checkIsIndexInBounds(index, byteCount, buffer)
Bytes.setChar(index, char, buffer.data)
}

/**
* Appends the bytes of a character to a buffer.
*
Expand Down
83 changes: 83 additions & 0 deletions stdlib/buffer.md
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,89 @@ Buffer.addString("Hello", buf)
assert Buffer.toString(buf) == "Hello"
```

### Buffer.**getChar**

<details disabled>
<summary tabindex="-1">Added in <code>next</code></summary>
No other changes yet.
</details>

```grain
getChar : (index: Number, buffer: Buffer) => Char
```

Gets the UTF-8 encoded character at the given byte index.

Parameters:

|param|type|description|
|-----|----|-----------|
|`index`|`Number`|The byte index to access|
|`buffer`|`Buffer`|The buffer to access|

Returns:

|type|description|
|----|-----------|
|`Char`|A character starting at the given index|

Throws:

`IndexOutOfBounds`

* When `index` is negative
* When `index + 1` is greater than the bytes size

`MalformedUnicode`

* When the requested character is not a valid UTF-8 sequence

Examples:

```grain
let buf = Buffer.make(32)
Buffer.addString("Hello World 🌾", buf)
assert Buffer.getChar(12, buf) == 'H'
```

### Buffer.**setChar**

<details disabled>
<summary tabindex="-1">Added in <code>next</code></summary>
No other changes yet.
</details>

```grain
setChar : (index: Number, char: Char, buffer: Buffer) => Void
```

UTF-8 encodes a character starting at the given byte index.

Parameters:

|param|type|description|
|-----|----|-----------|
|`index`|`Number`|The byte index to update|
|`char`|`Char`|The value to set|
|`buffer`|`Buffer`|The buffer to mutate|

Throws:

`IndexOutOfBounds`

* When `index` is negative
* When `index` is greater than or equal to the buffer size
* When `index + charSize` is greater than the bytes size, `charSize` is the number of bytes in the character ranging from 1 to 4

Examples:

```grain
let buf = Buffer.make(32)
Buffer.addString("Hello World.", buf)
Buffer.setChar(11, '!', buf)
assert Buffer.toString(buf) == "Hello World!"
```

### Buffer.**addChar**

<details disabled>
Expand Down
1 change: 1 addition & 0 deletions stdlib/bytes.gr
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,7 @@ provide let clear = (bytes: Bytes) => {
* @returns The character that starts at the given index
*
* @throws IndexOutOfBounds: When `index` is negative
* @throws IndexOutOfBounds: When `index + 1` is greater than the bytes size
* @throws MalformedUnicode: When the requested character is not a valid UTF-8 sequence
*
* @example
Expand Down
1 change: 1 addition & 0 deletions stdlib/bytes.md
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,7 @@ Throws:
`IndexOutOfBounds`

* When `index` is negative
* When `index + 1` is greater than the bytes size

`MalformedUnicode`

Expand Down
Loading