Skip to content

Initial implementation of helper library for Rust UDFs #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
set -o pipefail
curl https://wasmtime.dev/install.sh -sSf | bash
- name: Test
run: CARGO_TARGET_WASM32_WASI_RUNNER="/home/runner/.wasmtime/bin/wasmtime --invoke _start --allow-unknown-exports" cargo test --target=wasm32-wasi --all-targets
run: CARGO_TARGET_WASM32_WASI_RUNNER="/home/runner/.wasmtime/bin/wasmtime --allow-unknown-exports" cargo test --target=wasm32-wasi --all-targets

# Tests that our current minimum supported rust version compiles everything sucessfully
min_rust:
Expand Down
17 changes: 15 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
[workspace]
members = [
"scylla-bindgen",
"scylla-cql",
"examples",
"scylla-udf",
"scylla-udf-macros",
"tests",
]

[workspace.package]
edition = "2021"
version = "0.0.1"
repository = "https://github.com/wmitros/scylla-rust-udf"
license = "MIT OR Apache-2.0"
rust-version = "1.66.1"

[workspace.dependencies]
scylla-udf = { path = "scylla-udf" }
scylla-udf-macros = { path = "scylla-udf-macros" }
119 changes: 118 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,118 @@
# Rust-utils-for-Scylla-UDFs
# Rust helper library for Scylla UDFs

## Usage

### Prerequisites

To use this helper library in Scylla you'll need:
* Standard library for Rust `wasm32-wasi`
* Can be added in rustup installations using `rustup target add wasm32-wasi`
* For non rustup setups, you can try following the steps at https://rustwasm.github.io/docs/wasm-pack/prerequisites/non-rustup-setups.html
* Also available as an rpm: `rust-std-static-wasm32-wasi`
* `wasm2wat` parser
* Available in many distributions in the `wabt` package

### Compilation

We recommend a setup with cargo.

1. Start with a library package
```
cargo new --lib
```
2. Add the following lines to the Cargo.toml to set the crate-type to cdylib
```
[lib]
crate-type = ["cdylib"]
```
3. Implement your package, exporting Scylla UDFs using the `scylla_udf::export_udf` macro.
4. Build the package using the wasm32-wasi target:
```
cargo build --target=wasm32-wasi
```
5. Find the compiled `.wasm` binary. Let's assume it's `target/wasm32-wasi/debug/abc.wasm`.
6. (optional) Optimize the binary using `wasm-opt -O3 target/wasm32-wasi/debug/abc.wasm` (can be combined with using `cargo build --release` profile)
7. Translate the binary into `wat`:
```
wasm2wat target/wasm32-wasi/debug/abc.wasm > target/wasm32/wasi/debug/abc.wat
```

### CQL Statement

The resulting `target/wasm32/wasi/debug/abc.wat` code can now be used directly in a `CREATE FUNCTION` statement. The resulting code will most likely
contain `'` characters, so it may be necessary to first replace them with `''`, so that they're usable in a CQL string.

For example, if you have an [Rust UDF](examples/commas.rs) that joins a list of words using commas, you can create a Scylla UDF using the following statement:
```
CREATE FUNCTION commas(string list<text>) CALLED ON NULL INPUT RETURNS text AS ' (module ...) '
```


## CQL Type Mapping

The argument and return value types used in functions annotated with `#[export_udf]` must all map to CQL types used in the `CREATE FUNCTION` statements used in Scylla, according to the tables below.

If the Scylla function is created with types that do not match the types used in the Rust function, calling the UDF will fail or produce arbitrary results.

### Native types

| CQL Type | Rust type |
| --------- | ----------------------------- |
| ASCII | String |
| BIGINT | i64 |
| BLOB | Vec\<u8\> |
| BOOLEAN | bool |
| COUNTER | scylla_udf::Counter |
| DATE | chrono::NaiveDate |
| DECIMAL | bigdecimal::Decimal |
| DOUBLE | f64 |
| DURATION | scylla_udf::CqlDuration |
| FLOAT | f32 |
| INET | std::net::IpAddr |
| INT | i32 |
| SMALLINT | i16 |
| TEXT | String |
| TIME | scylla_udf::Time |
| TIMESTAMP | scylla_udf::Timestamp |
| TIMEUUID | uuid::Uuid |
| TINYINT | i8 |
| UUID | uuid::Uuid |
| VARCHAR | String |
| VARINT | num_bigint::BigInt |

### Collections

If a CQL type `T` maps to Rust type `RustT`, you can use it as a collection parameter:

| CQL Type | Rust type |
| ---------- | ------------------------------------------------------------------------------------- |
| LIST\<T\> | Vec\<RustT\> |
| MAP\<T\> | std::collections::BTreeMap\<RustT\>, std::collections::HashMap\<RustT\> |
| SET\<T\> | Vec\<RustT\>, std::collections::BTreeSet\<RustT\>, std::collections::HashSet\<RustT\> |


### Tuples

If CQL types `T1`, `T2`, ... map to Rust types `RustT1`, `RustT2`, ..., you can use them in tuples:

| CQL Type | Rust type |
| -------- | ---------------------------------- |
| TUPLE\<T1, T2, ...\> | (RustT1, RustT2, ...) |

### Nulls

If a CQL Value of type T, that's mapped to type RustT, may be a null (possible in non-`RETURNS NULL ON NULL INPUT` UDFs),
the type used in the Rust function should be Option\<RustT\>.

## Contributing

In general, try to follow the same rules as in https://github.com/scylladb/scylla-rust-driver/blob/main/CONTRIBUTING.md

### Testing

This crate is meant to be compiled to a `wasm32-wasi` target and ran in a WASM runtime. The tests that use WASM-specific code will most likely not succeed when executed in a different way (in particular, with a simple `cargo test` command).

For example, if you have the [wasmtime](https://docs.wasmtime.dev/cli-install.html) runtime installed and in `PATH`, you can use the following command to run tests:
```text
CARGO_TARGET_WASM32_WASI_RUNNER="wasmtime --allow-unknown-exports" cargo test --target=wasm32-wasi
```
65 changes: 65 additions & 0 deletions examples/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
[package]
name = "examples"
edition.workspace = true
version.workspace = true
repository.workspace = true
license.workspace = true
rust-version.workspace = true
publish = false

[dependencies]
chrono = "0.4"
bigdecimal = "0.2.0"
num-bigint = "0.3"
scylla-udf = { workspace = true }
uuid = "1.0"

[[example]]
name = "add"
path = "add.rs"
crate-type = ["cdylib"]

[[example]]
name = "combine"
path = "combine.rs"
crate-type = ["cdylib"]

[[example]]
name = "commas"
path = "commas.rs"
crate-type = ["cdylib"]

[[example]]
name = "dbl"
path = "dbl.rs"
crate-type = ["cdylib"]

[[example]]
name = "fib"
path = "fib.rs"
crate-type = ["cdylib"]

[[example]]
name = "keys"
path = "keys.rs"
crate-type = ["cdylib"]

[[example]]
name = "len"
path = "len.rs"
crate-type = ["cdylib"]

[[example]]
name = "topn"
path = "topn.rs"
crate-type = ["cdylib"]

[[example]]
name = "udt"
path = "udt.rs"
crate-type = ["cdylib"]

[[example]]
name = "wordcount"
path = "wordcount.rs"
crate-type = ["cdylib"]
8 changes: 8 additions & 0 deletions examples/add.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
use scylla_udf::export_udf;

type SmallInt = i16;

#[export_udf]
fn add(i1: SmallInt, i2: SmallInt) -> SmallInt {
i1 + i2
}
50 changes: 50 additions & 0 deletions examples/combine.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
use scylla_udf::{export_udf, CqlDuration, Time, Timestamp};

#[allow(clippy::too_many_arguments, clippy::type_complexity)]
#[export_udf]
fn combine(
b: bool,
blob: Vec<u8>,
date: chrono::NaiveDate,
bd: bigdecimal::BigDecimal,
dbl: f64,
cqldur: CqlDuration,
flt: f32,
int32: i32,
int64: i64,
s: String,
tstamp: Timestamp,
ip: std::net::IpAddr,
int16: i16,
int8: i8,
tim: Time,
uid: uuid::Uuid,
bi: num_bigint::BigInt,
) -> (
(
bool,
Vec<u8>,
chrono::NaiveDate,
bigdecimal::BigDecimal,
f64,
CqlDuration,
f32,
i32,
i64,
),
(
String,
Timestamp,
std::net::IpAddr,
i16,
i8,
Time,
uuid::Uuid,
num_bigint::BigInt,
),
) {
(
(b, blob, date, bd, dbl, cqldur, flt, int32, int64),
(s, tstamp, ip, int16, int8, tim, uid, bi),
)
}
6 changes: 6 additions & 0 deletions examples/commas.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
use scylla_udf::export_udf;

#[export_udf]
fn commas(strings: Option<Vec<String>>) -> Option<String> {
strings.map(|strings| strings.join(", "))
}
9 changes: 9 additions & 0 deletions examples/dbl.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
use scylla_udf::export_udf;

#[export_udf]
fn dbl(s: String) -> String {
let mut newstr = String::new();
newstr.push_str(&s);
newstr.push_str(&s);
newstr
}
16 changes: 16 additions & 0 deletions examples/fib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
use scylla_udf::*;

#[export_newtype]
struct FibInputNumber(i32);

#[export_newtype]
struct FibReturnNumber(i64);

#[export_udf]
fn fib(i: FibInputNumber) -> FibReturnNumber {
FibReturnNumber(if i.0 <= 2 {
1
} else {
fib(FibInputNumber(i.0 - 1)).0 + fib(FibInputNumber(i.0 - 2)).0
})
}
6 changes: 6 additions & 0 deletions examples/keys.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
use scylla_udf::export_udf;

#[export_udf]
fn keys(map: std::collections::BTreeMap<String, String>) -> Vec<String> {
map.into_keys().collect()
}
6 changes: 6 additions & 0 deletions examples/len.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
use scylla_udf::export_udf;

#[export_udf]
fn len(strings: std::collections::BTreeSet<String>) -> i16 {
strings.len() as i16
}
66 changes: 66 additions & 0 deletions examples/topn.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
use scylla_udf::*;
use std::collections::BTreeSet;

#[export_newtype]
struct StringLen(String);

impl std::cmp::PartialEq for StringLen {
fn eq(&self, other: &Self) -> bool {
self.0 == other.0
}
}

impl std::cmp::Eq for StringLen {}

impl std::cmp::PartialOrd for StringLen {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}

impl std::cmp::Ord for StringLen {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
if self.0.len().cmp(&other.0.len()) == std::cmp::Ordering::Equal {
self.0.cmp(&other.0)
} else {
self.0.len().cmp(&other.0.len())
}
}
}

// Store the top N strings by length, without repetitions.
#[export_udf]
fn topn_row(
acc_tup: Option<(i32, BTreeSet<StringLen>)>,
v: Option<StringLen>,
) -> Option<(i32, BTreeSet<StringLen>)> {
if let Some((n, mut acc)) = acc_tup {
if let Some(v) = v {
acc.insert(v);
while acc.len() > n as usize {
acc.pop_first();
}
}
Some((n, acc))
} else {
None
}
}

#[export_udf]
fn topn_reduce(
(n1, mut acc1): (i32, BTreeSet<StringLen>),
(n2, mut acc2): (i32, BTreeSet<StringLen>),
) -> (i32, BTreeSet<StringLen>) {
assert!(n1 == n2);
acc1.append(&mut acc2);
while acc1.len() > n1 as usize {
acc1.pop_first();
}
(n1, acc1)
}

#[export_udf]
fn topn_final((_, acc): (i32, BTreeSet<StringLen>)) -> BTreeSet<StringLen> {
acc
}
Loading