Below is a table containing all the details for the property group: Ingest Batcher
Property Name | Description | Default Value | Run CdkDeploy When Changed |
---|---|---|---|
sleeper.table.ingest.batcher.job.min.size | Specifies the minimum total file size required for an ingest job to be batched and sent. An ingest job will be created if the batcher runs while this much data is waiting, and the minimum number of files is also met. | 1G | false |
sleeper.table.ingest.batcher.job.max.size | Specifies the maximum total file size for a job in the ingest batcher. If more data is waiting than this, it will be split into multiple jobs. If a single file exceeds this, it will still be ingested in its own job. It's also possible some data may be left for a future run of the batcher if some recent files overflow the size of a job but aren't enough to create a job on their own. | 5G | false |
sleeper.table.ingest.batcher.job.min.files | Specifies the minimum number of files for a job in the ingest batcher. An ingest job will be created if the batcher runs while this many files are waiting, and the minimum size of files is also met. | 1 | false |
sleeper.table.ingest.batcher.job.max.files | Specifies the maximum number of files for a job in the ingest batcher. If more files are waiting than this, they will be split into multiple jobs. It's possible some data may be left for a future run of the batcher if some recent files overflow the size of a job but aren't enough to create a job on their own. | 100 | false |
sleeper.table.ingest.batcher.file.max.age.seconds | Specifies the maximum time in seconds that a file can be held in the batcher before it will be included in an ingest job. When any file has been waiting for longer than this, a job will be created with all the currently held files, even if other criteria for a batch are not met. | 300 | false |
sleeper.table.ingest.batcher.ingest.queue | Specifies the target ingest queue where batched jobs are sent. Valid values are: [standard_ingest, bulk_import_emr, bulk_import_persistent_emr, bulk_import_eks, bulk_import_emr_serverless] |
bulk_import_emr_serverless | false |
sleeper.table.ingest.batcher.file.tracking.ttl.minutes | The time in minutes that the tracking information is retained for a file before the records of its ingest are deleted (eg. which ingest job it was assigned to, the time this occurred, the size of the file). The expiry time is fixed when a file is saved to the store, so changing this will only affect new data. Defaults to 1 week. |
10080 | false |