Skip to content

Open Variant Type for semi-structured data #10987

Open
@wjones127

Description

@wjones127

I've been starting to experiment with implementing the Open Variant Type 1 in Rust / DataFusion. There is a specification and Java library for this, and Spark will release this type in 4.0. There are also plans to integrate this into table formats such as Delta Lake 2 and Iceberg 3. This would be a high-performance data type for semi-structured data, designed for better OLAP performance than JSON or BSON (discussed in #7845). I've discussed a little bit in the Arrow repo about it's potential as an Arrow extension type 4.

I'm working on creating an extension similar to datafusion-functions-json. If we could create a new repo datafusion-functions-variant, I'd be happy to develop that in the open.

Footnotes

  1. https://github.com/apache/spark/tree/master/common/variant

  2. https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark

  3. https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34

  4. https://github.com/apache/arrow/issues/42069

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions