Skip to content

Latest commit

 

History

History
17 lines (15 loc) · 7.7 KB

bulk_import.md

File metadata and controls

17 lines (15 loc) · 7.7 KB

BULK IMPORT

Below is a table containing all the details for the property group: Bulk Import

Property Name Description Default Value Run CdkDeploy When Changed
sleeper.table.bulk.import.emr.instance.architecture (Non-persistent EMR mode only) Which architecture to be used for EC2 instance types in the EMR cluster. Must be either "x86_64" "arm64" or "x86_64,arm64". For more information, see the Bulk import using EMR - Instance types section in docs/usage/ingest.md arm64 false
sleeper.table.bulk.import.emr.master.x86.instance.types (Non-persistent EMR mode only) The EC2 x86_64 instance types and weights to be used for the master node of the EMR cluster.
For more information, see the Bulk import using EMR - Instance types section in docs/usage/ingest.md
m7i.xlarge false
sleeper.table.bulk.import.emr.executor.x86.instance.types (Non-persistent EMR mode only) The EC2 x86_64 instance types and weights to be used for the executor nodes of the EMR cluster.
For more information, see the Bulk import using EMR - Instance types section in docs/usage/ingest.md
m7i.4xlarge false
sleeper.table.bulk.import.emr.master.arm.instance.types (Non-persistent EMR mode only) The EC2 ARM64 instance types and weights to be used for the master node of the EMR cluster.
For more information, see the Bulk import using EMR - Instance types section in docs/usage/ingest.md
m7g.xlarge false
sleeper.table.bulk.import.emr.executor.arm.instance.types (Non-persistent EMR mode only) The EC2 ARM64 instance types and weights to be used for the executor nodes of the EMR cluster.
For more information, see the Bulk import using EMR - Instance types section in docs/usage/ingest.md
m7g.4xlarge false
sleeper.table.bulk.import.emr.executor.market.type (Non-persistent EMR mode only) The purchasing option to be used for the executor nodes of the EMR cluster.
Valid values are ON_DEMAND or SPOT.
SPOT false
sleeper.table.bulk.import.emr.executor.initial.capacity (Non-persistent EMR mode only) The initial number of capacity units to provision as EC2 instances for executors in the EMR cluster.
This is measured in instance fleet capacity units. These are declared alongside the requested instance types, as each type will count for a certain number of units. By default the units are the number of instances.
This value overrides the default value in the instance properties. It can be overridden by a value in the bulk import job specification.
2 false
sleeper.table.bulk.import.emr.executor.max.capacity (Non-persistent EMR mode only) The maximum number of capacity units to provision as EC2 instances for executors in the EMR cluster.
This is measured in instance fleet capacity units. These are declared alongside the requested instance types, as each type will count for a certain number of units. By default the units are the number of instances.
This value overrides the default value in the instance properties. It can be overridden by a value in the bulk import job specification.
10 false
sleeper.table.bulk.import.emr.release.label (Non-persistent EMR mode only) The EMR release label to be used when creating an EMR cluster for bulk importing data using Spark running on EMR.
This value overrides the default value in the instance properties. It can be overridden by a value in the bulk import job specification.
emr-7.2.0 false
sleeper.table.bulk.import.min.leaf.partitions Specifies the minimum number of leaf partitions that are needed to run a bulk import job. If this minimum has not been reached, bulk import jobs will refuse to start 64 false
sleeper.table.bulk.import.job.files.commit.async If true, bulk import will add files via requests sent to the state store committer lambda asynchronously. If false, bulk import will commit new files at the end of the job synchronously.
This is only applied if async commits are enabled for the table. The default value is set in an instance property.
true false