Below is a table containing all the details for the property group: Compaction
Property Name | Description | Default Value | Run CdkDeploy When Changed |
---|---|---|---|
sleeper.compaction.job.creation.batch.size | The number of tables to perform compaction job creation for in a single invocation. This will be the batch size for a lambda as an SQS FIFO event source. This can be a maximum of 10. | 1 | true |
sleeper.compaction.job.commit.batch.size | The number of finished compaction commits to gather in the batcher before committing to the state store. This will be the batch size for a lambda as an SQS event source. This can be a maximum of 10,000. In practice the effective maximum is limited by the number of messages that fit in a synchronous lambda invocation payload, see the AWS documentation: https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html |
1000 | true |
sleeper.compaction.job.commit.batching.window.seconds | The time in seconds that the batcher will wait for compaction commits to appear if the batch size is not filled. This will be set in the SQS event source for the lambda. This can be a maximum of 300, i.e. 5 minutes. | 30 | true |
sleeper.compaction.queue.visibility.timeout.seconds | The visibility timeout for the queue of compaction jobs. | 900 | true |
sleeper.compaction.pending.queue.visibility.timeout.seconds | The visibility timeout for the queue of pending compaction job batches. | 900 | true |
sleeper.compaction.keepalive.period.seconds | The frequency, in seconds, with which change message visibility requests are sent to extend the visibility of messages on the compaction job queue so that they are not processed by other processes. This should be less than the value of sleeper.compaction.queue.visibility.timeout.seconds. |
300 | false |
sleeper.compaction.job.failed.visibility.timeout.seconds | The delay in seconds until a failed compaction job becomes visible on the compaction job queue and can be processed again. | 60 | false |
sleeper.compaction.task.wait.time.seconds | The time in seconds for a compaction task to wait for a compaction job to appear on the SQS queue (must be <= 20). When a compaction task waits for compaction jobs to appear on the SQS queue, if the task receives no messages in the time defined by this property, it will try to wait for a message again. |
20 | false |
sleeper.compaction.task.wait.for.input.file.assignment | Set to true if compaction tasks should wait for input files to be assigned to a compaction job before starting it. The compaction task will poll the state store for whether the input files have been assigned to the job, and will only start once this has occurred. This prevents invalid compaction jobs from being run, particularly in the case where the compaction job creator runs again before the input files are assigned. This also causes compaction tasks to wait idle while input files are assigned, and puts extra load on the state store when there are many compaction tasks. If this is false, any created job will be executed, and will only be validated when committed to the state store. |
false | false |
sleeper.compaction.task.delay.before.retry.seconds | The time in seconds for a compaction task to wait after receiving no compaction jobs before attempting to receive a message again. When a compaction task waits for compaction jobs to appear on the SQS queue, if the task receives no messages in the time defined by the property "sleeper.compaction.task.wait.time.seconds", it will wait for a number of seconds defined by this property, then try to receive a message again. |
10 | false |
sleeper.compaction.task.max.idle.time.seconds | The total time in seconds that a compaction task can be idle before it is terminated. When there are no compaction jobs available on the SQS queue, and SQS returns no jobs, the task will check whether this idle time has elapsed since the last time it finished a job. If so, the task will terminate. |
60 | false |
sleeper.compaction.task.max.consecutive.failures | The maximum number of times that a compaction task can fail to process consecutive compaction jobs before it terminates. When the task starts or completes any job successfully, the count of consecutive failures is set to zero. Any time it fails to process a job, this count is incremented. If this maximum is reached, the task will terminate. |
3 | false |
sleeper.compaction.job.creation.period.minutes | The rate at which the compaction job creation lambda runs (in minutes, must be >=1). | 1 | true |
sleeper.compaction.job.creation.memory.mb | The amount of memory in MB for the lambda that creates compaction jobs. | true | |
sleeper.compaction.job.creation.timeout.seconds | The timeout for the lambda that creates compaction jobs in seconds. | 900 | true |
sleeper.compaction.job.creation.concurrency.reserved | The reserved concurrency for the lambda used to create compaction jobs. See reserved concurrency overview at: https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html |
false | |
sleeper.compaction.job.creation.concurrency.max | The maximum given concurrency allowed for the lambda used to create compaction jobs. See maximum concurrency overview at: https://aws.amazon.com/blogs/compute/introducing-maximum-concurrency-of-aws-lambda-functions-when-using-amazon-sqs-as-an-event-source/ |
false | |
sleeper.compaction.job.dispatch.memory.mb | The amount of memory in MB for the lambda that sends batches of compaction jobs. | true | |
sleeper.compaction.job.dispatch.timeout.seconds | The timeout for the lambda that sends batches of compaction jobs in seconds. | 900 | true |
sleeper.compaction.job.dispatch.concurrency.reserved | The reserved concurrency for the lambda that sends batches of compaction jobs. See reserved concurrency overview at: https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html |
false | |
sleeper.compaction.job.dispatch.concurrency.max | The maximum concurrency allowed for the lambda that sends batches of compaction jobs. See maximum concurrency overview at: https://aws.amazon.com/blogs/compute/introducing-maximum-concurrency-of-aws-lambda-functions-when-using-amazon-sqs-as-an-event-source/ |
false | |
sleeper.compaction.commit.batcher.memory.mb | The amount of memory in MB for the lambda that batches up compaction commits. | true | |
sleeper.compaction.commit.batcher.timeout.seconds | The timeout for the lambda that batches up compaction commits in seconds. | 900 | true |
sleeper.compaction.commit.batcher.concurrency.reserved | The reserved concurrency for the lambda that batches up compaction commits. See reserved concurrency overview at: https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html |
2 | false |
sleeper.compaction.commit.batcher.concurrency.max | The maximum concurrency allowed for the lambda that batches up compaction commits. See maximum concurrency overview at: https://aws.amazon.com/blogs/compute/introducing-maximum-concurrency-of-aws-lambda-functions-when-using-amazon-sqs-as-an-event-source/ |
2 | false |
sleeper.compaction.max.concurrent.tasks | The maximum number of concurrent compaction tasks to run. | 300 | false |
sleeper.compaction.task.creation.period.minutes | The rate at which a check to see if compaction ECS tasks need to be created is made (in minutes, must be >= 1). | 1 | true |
sleeper.compaction.job.max.retries | The maximum number of times that a compaction job can be taken off the job definition queue before it is moved to the dead letter queue. This property is used to configure the maxReceiveCount of the compaction job definition queue. |
3 | false |
sleeper.compaction.job.dispatch.max.retries | The maximum number of times that a batch of compaction jobs can be taken off the pending queue before it is moved to the dead letter queue. This property is used to configure the maxReceiveCount of the pending compaction job batch queue. |
3 | false |
sleeper.compaction.job.commit.max.retries | The maximum number of times that a compaction job can be taken off the batch committer queue before it is moved to the dead letter queue. This property is used to configure the maxReceiveCount of the compaction job committer queue. |
3 | false |
sleeper.compaction.task.cpu.architecture | The CPU architecture to run compaction tasks on. Valid values are X86_64 and ARM64. See Task CPU architecture at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html |
X86_64 | true |
sleeper.compaction.task.arm.cpu | The CPU for a compaction task using an ARM64 architecture. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html for valid options. |
1024 | true |
sleeper.compaction.task.arm.memory.mb | The amount of memory in MB for a compaction task using an ARM64 architecture. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html for valid options. |
4096 | true |
sleeper.compaction.task.x86.cpu | The CPU for a compaction task using an x86_64 architecture. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html for valid options. |
1024 | true |
sleeper.compaction.task.x86.memory.mb | The amount of memory in MB for a compaction task using an x86_64 architecture. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html for valid options. |
4096 | true |
sleeper.compaction.task.scaling.overhead.fixed | Used when scaling EC2 instances to support an expected number of compaction tasks. This is the amount of memory in MB that we expect ECS to reserve on an EC2 instance before making memory available for compaction tasks. If this is unset, it will be computed as a percentage of the memory on the EC2 instead, see sleeper.compaction.task.scaling.overhead.percentage . |
true | |
sleeper.compaction.task.scaling.overhead.percentage | Used when scaling EC2 instances to support an expected number of compaction tasks. This is the percentage of memory in an EC2 instance that we expect ECS to reserve before making memory available for compaction tasks. Defaults to 10%, so we expect 90% of the memory on an EC2 instance to be used for compaction tasks. |
10 | true |
sleeper.compaction.ecs.launch.type | What launch type should compaction containers use? Valid options: FARGATE, EC2. | FARGATE | false |
sleeper.compaction.ec2.type | The EC2 instance type to use for compaction tasks (when using EC2-based compactions). | t3.xlarge | false |
sleeper.compaction.ec2.pool.minimum | The minimum number of instances for the EC2 cluster (when using EC2-based compactions). | 0 | false |
sleeper.compaction.ec2.pool.desired | The initial desired number of instances for the EC2 cluster (when using EC2-based compactions). Can be set by dividing initial maximum containers by number that should fit on instance type. |
0 | false |
sleeper.compaction.ec2.pool.maximum | The maximum number of instances for the EC2 cluster (when using EC2-based compactions). | 75 | false |
sleeper.compaction.ec2.root.size | The size in GiB of the root EBS volume attached to the EC2 instances (when using EC2-based compactions). | 50 | false |
sleeper.compaction.tracker.enabled | Flag to enable/disable storage of tracking information for compaction jobs and tasks. | true | true |
sleeper.compaction.tracker.async.commit.updates.enabled | Flag to enable/disable storing an update to the tracker during async commits of compaction jobs. This may be disabled if there are enough compactions that the system is unable to apply all the updates to the tracker. This is mainly used for testing. Reports may show compactions as unfinished if this update is not present in the tracker. | true | false |
sleeper.compaction.job.status.ttl | The time to live in seconds for compaction job updates in the job tracker. Default is 1 week. The expiry time is fixed when an update is saved to the store, so changing this will only affect new data. |
604800 | false |
sleeper.compaction.task.status.ttl | The time to live in seconds for compaction task updates in the job tracker. Default is 1 week. The expiry time is fixed when an update is saved to the store, so changing this will only affect new data. |
604800 | false |
sleeper.default.compaction.strategy.class | The name of the class that defines how compaction jobs should be created. This should implement sleeper.compaction.core.job.creation.strategy.CompactionStrategy. The value of this property is the default value which can be overridden on a per-table basis. | sleeper.compaction.core.job.creation.strategy.impl.SizeRatioCompactionStrategy | false |
sleeper.default.compaction.files.batch.size | The maximum number of files to read in a compaction job. Note that the state store must support atomic updates for this many files. Also note that this many files may need to be open simultaneously. The value of 'sleeper.fs.s3a.max-connections' must be at least the value of this plus one. The extra one is for the output file. This is a default value and will be used if not specified in the table properties. |
12 | false |
sleeper.default.table.compaction.job.send.batch.size | The number of compaction jobs to send in a single batch. When compaction jobs are created, there is no limit on how many jobs can be created at once. A batch is a group of compaction jobs that will have their creation updates applied at the same time. For each batch, we send all compaction jobs to the SQS queue, then update the state store to assign job IDs to the input files. This can be overridden on a per-table basis. |
1000 | false |
sleeper.default.table.compaction.job.send.timeout.seconds | The amount of time in seconds a batch of compaction jobs may be pending before it should not be retried. If the input files have not been successfully assigned to the jobs, and this much time has passed, then the batch will fail to send. Once a pending batch fails the input files will never be compacted again without other intervention, so it's important to ensure file assignment will be done within this time. That depends on the throughput of state store commits. It's also necessary to ensure file assignment will be done before the next invocation of compaction job creation, otherwise invalid jobs will be created for the same input files. The rate of these invocations is set in sleeper.compaction.job.creation.period.minutes . |
90 | false |
sleeper.default.table.compaction.job.send.retry.delay.seconds | The amount of time in seconds to wait between attempts to send a batch of compaction jobs. The batch will be sent if all input files have been successfully assigned to the jobs, otherwise the batch will be retried after a delay. | 30 | false |
sleeper.default.table.compaction.job.creation.limit | The default limit on the number of compactation jobs that can be created within a single invocation.Exceeding this limit, results in the selection being randomised. | 100000 | false |
sleeper.default.table.compaction.strategy.sizeratio.ratio | Used by the SizeRatioCompactionStrategy to decide if a group of files should be compacted. If the file sizes are s_1, ..., s_n then the files are compacted if s_1 + ... + s_{n-1} >= ratio * s_n. It can be overridden on a per-table basis. |
3 | false |
sleeper.default.table.compaction.strategy.sizeratio.max.concurrent.jobs.per.partition | Used by the SizeRatioCompactionStrategy to control the maximum number of jobs that can be running concurrently per partition. It can be overridden on a per-table basis. | 2147483647 | false |
sleeper.default.table.compaction.method | Select which compaction method to use if not configured against a Sleeper table. DataFusion compaction support is experimental. Valid values are: [java, datafusion] |
JAVA | false |
sleeper.compaction.cluster | The name of the cluster used for compactions. | true | |
sleeper.compaction.ec2.task.definition | The name of the family of EC2 task definitions used for compactions. | true | |
sleeper.compaction.fargate.task.definition | The name of the family of Fargate task definitions used for compactions. | true | |
sleeper.compaction.job.creation.trigger.lambda.function | The function name of the lambda to trigger compaction job creation for all tables. | true | |
sleeper.compaction.job.creation.rule | The name of the CloudWatch rule that periodically triggers the compaction job creation lambda. | true | |
sleeper.compaction.job.creation.queue.url | The URL of the queue for tables requiring compaction job creation. | true | |
sleeper.compaction.job.creation.queue.arn | The ARN of the queue for tables requiring compaction job creation. | true | |
sleeper.compaction.job.creation.dlq.url | The URL of the dead letter queue for tables that failed compaction job creation. | true | |
sleeper.compaction.job.creation.dlq.arn | The ARN of the dead letter queue for tables that failed compaction job creation. | true | |
sleeper.compaction.job.queue.url | The URL of the queue for compaction jobs. | true | |
sleeper.compaction.job.queue.arn | The ARN of the queue for compaction jobs. | true | |
sleeper.compaction.job.dlq.url | The URL of the dead letter queue for compaction jobs. | true | |
sleeper.compaction.job.dlq.arn | The ARN of the dead letter queue for compaction jobs. | true | |
sleeper.compaction.pending.queue.url | The URL of the queue for pending compaction job batches. | true | |
sleeper.compaction.pending.queue.arn | The ARN of the queue for pending compaction job batches. | true | |
sleeper.compaction.pending.dlq.url | The URL of the dead letter queue for pending compaction job batches. | true | |
sleeper.compaction.pending.dlq.arn | The ARN of the dead letter queue for pending compaction job batches. | true | |
sleeper.compaction.commit.queue.url | The URL of the queue for compaction jobs ready to commit to the state store. | true | |
sleeper.compaction.commit.queue.arn | The ARN of the queue for compaction jobs ready to commit to the state store. | true | |
sleeper.compaction.commit.dlq.url | The URL of the dead letter queue for compaction jobs ready to commit to the state store. | true | |
sleeper.compaction.commit.dlq.arn | The ARN of the dead letter queue for compaction jobs ready to commit to the state store. | true | |
sleeper.compaction.task.creation.lambda.function | The function name of the compaction task creation lambda. | true | |
sleeper.compaction.task.creation.rule | The name of the CloudWatch rule that periodically triggers the compaction task creation lambda. | true | |
sleeper.compaction.scaling.group | The name of the compaction EC2 auto scaling group. | true |