Skip to content

Latest commit

 

History

History
19 lines (17 loc) · 6.53 KB

data_storage.md

File metadata and controls

19 lines (17 loc) · 6.53 KB

DATA STORAGE

Below is a table containing all the details for the property group: Data Storage

Property Name Description Default Value Run CdkDeploy When Changed
sleeper.table.rowgroup.size The size of the row group in the Parquet files - defaults to the value in the instance properties. 8388608 false
sleeper.table.page.size The size of the page in the Parquet files - defaults to the value in the instance properties. 131072 false
sleeper.table.parquet.dictionary.encoding.rowkey.fields Whether dictionary encoding should be used for row key columns in the Parquet files. false false
sleeper.table.parquet.dictionary.encoding.sortkey.fields Whether dictionary encoding should be used for sort key columns in the Parquet files. false false
sleeper.table.parquet.dictionary.encoding.value.fields Whether dictionary encoding should be used for value columns in the Parquet files. false false
sleeper.table.parquet.columnindex.truncate.length Used to set parquet.columnindex.truncate.length, see documentation here:
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md
The length in bytes to truncate binary values in a column index.
128 false
sleeper.table.parquet.statistics.truncate.length Used to set parquet.statistics.truncate.length, see documentation here:
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md
The length in bytes to truncate the min/max binary values in row groups.
2147483647 false
sleeper.table.parquet.writer.version Used to set parquet.writer.version, see documentation here:
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md
Can be either v1 or v2. The v2 pages store levels uncompressed while v1 pages compress levels with the data.
v2 false
sleeper.table.parquet.query.column.index.enabled Used during Parquet queries to determine whether the column indexes are used. false false
sleeper.table.fs.s3a.readahead.range The S3 readahead range - defaults to the row group size. 8388608 false
sleeper.table.compression.codec The compression codec to use for this table. Defaults to the value in the instance properties.
Valid values are: [uncompressed, snappy, gzip, lzo, brotli, lz4, zstd]
zstd false
sleeper.table.gc.delay.minutes A file will not be deleted until this number of minutes have passed after it has been marked as ready for garbage collection. The reason for not deleting files immediately after they have been marked as ready for garbage collection is that they may still be in use by queries. Defaults to the value set in the instance properties. 15 false
sleeper.table.gc.commit.async If true, deletion of files will be applied via asynchronous requests sent to the state store committer lambda. If false, the garbage collector lambda will apply synchronously.
This is only applied if async commits are enabled for the table. The default value is set in an instance property.
true false