Description
Affected version
23.4.0
Current and expected behavior
We are deploying Stackable on Azure with AKS using Helm/Terraform. We have successfully run SparkApplications on the default node pool. However, we would like to be able to deploy executors in a second node pool containing only Spot instances.
In Azure, all Spot instance node pools automatically get the taint kubernetes.azure.com/scalesetpriority=spot:NoSchedule
(even if we do not specify it in the Terraform file, this taint is apparently mandatory).
Now, I can specify nodeAffinity to match the spot instances' labels, but I haven't found a way to pass tolerations. The helm chart for the Spark operator has a "tolerations" variable and I tried passing the right toleration there (as specified here), but it had no effect:
The executors will not schedule, since their affinity does not match the default node pool and they have no toleration for the spot
Is there a way to pass tolerations in a SparkApplication that I have just overlooked? If not: I think this would be a fairly relevant feature for pod placement. Are there any plans to implement this?