Description
Is your feature request related to a problem or challenge?
Modern Macs / Apple Silicon (and Intel 12th gen processors) make a distinction between Performance (P
) and Efficiency (E
) cores.
https://developer.apple.com/news/?id=vk3m204o
Recent Apple Silicon like A13 Bionic has both high-performance cores (P cores) and high-efficiency cores (E cores). These different core types allow you to deliver apps that have both great performance and great battery life. To take full advantage of their performance and efficiency, you can provide the operating system (OS) with information about how to execute your app in the most optimal way. From there, the OS uses semantic information to make better scheduling and performance control decisions.
Intel 12th gen chips have this feature too, for example:
https://www.intel.com/content/www/us/en/gaming/resources/how-hybrid-design-works.html
Intel® Core™ desktop processors integrate two types of cores into a single die: powerful Performance-cores (P-cores) and flexible Efficient-cores (E-cores). Both types of core have a different role.
DataFusion currently evenly partitions work between cores, and does not distinguish or take advantage of the different kinds of cores. This could lead to less than optimal throughput as the (slower) efficiency cores would be given as much work as the faster power cores, leading to the power cores being idle sometimes
@pepijnve may be seen this manifesting as inconsistent behavior / profiling on #16398
Describe the solution you'd like
Ensure DataFusion runs on architectures with this hybrid P
/E
design efficiently.
I don't know what this means exactly
Describe alternatives you've considered
Note sure. Maybe we can follow the ideas from Mac:
We may also have to have support in tokio 🤔
Additional context
@pepijnve brought this up on a PR where we are seeing performance variability
🤔 now that I write '10-core MacBook', I'm wondering if the 10 core part is where my variability is coming from. That's 6 performance and 4 efficiency cores. Ideally DataFusion keeps the CPU bound work on the perf cores, and uses the efficiency ones for IO. I had been wondering about that and NUMA effects already. A topic for a different thread though.