-
-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: multithread linting #129
base: main
Are you sure you want to change the base?
Conversation
I don't see any mention of the previous RFCs around parallelisation: Both of these have a lot of context about the difficulties of parallelisation outside of the core rules - eg in cases where the parser or the rules store state in any form.
Naively parallelising by just "randomly" distributing files across threads may lead to a SLOWER lint run in cases where people use such stateful plugins because the cached work may need to be redone once for each thread. I would like to see such usecases addressed as part of this RFC given that these mentioned usecases are very prevalent - with both mentioned plugins in use by the majority of the ecosystem. These problems have been discussed before in the context of language plugins and parser contexts (I can try to find the threads a bit later). |
Thanks for the input @bradzacher. How would you go about incorporating context from #42 and #87 into this RFC? I see that #42 suggests introducing a plugin setting As for #87, it seems about an unrelated feature that doesn't even require multithreading. But I get why it would be beneficial to limit the number of instances of the same parser across threads, especially if the parser takes a long time to load its initial state, like typescript-eslint with type-aware parsing. If you have any concrete suggestions on how to do that, I'd love to know.
I imagine the way one would address such use cases is by making no changes, i.e. not enabling multithread linting if the results are not satisfactory. But if something can be done to improve performance for popular plugins that would be awesome. |
To be clear - I'm all for such a system existing. Like caching it can vastly improve the experience for those that fit within the bounds. The thing I want to make sure of is that we ensure the bounds are either intentionally designed to be tight to avoid complexity explosion, or that we are at least planning a path forward for the mentioned cases. #87 has some discussions around parallel parsing which are relevant to the sorts of ideas we'd need to consider here. Some other relevant discussions can be found in I'm pretty swamped atm cos holiday season and kids and probably won't be able to get back to this properly until the new year. |
Co-authored-by: 唯然 <[email protected]>
Thanks for putting this together. I'm going to need more time to dig into the details, and I really appreciate the amount of thought and explanation you've included in this RFC. I have a few high-level thoughts from reviewing this today:
|
Yes, it would be interesting to look into other tools to understand how they handle concurrency. This could actually bring in some interesting ideas even if the general approach is different. I was thinking to check Prettier but haven't managed to do that yet. Jest and Ava are also good candidates.
Thanks for the list. I missed most of those links while skimming through the discussion in eslint#3565. I'll be sure to go through the items and add a prior art mention.
Workers don't need to create a new instance of |
One thing I'd like to point out before it's too late and just in case it's relevant: multithreading makes multifile analysis harder. If there ever comes a system where a single rule can look at the contents of multiple files—as implemented in I imagine this kind of analysis is not really in the scope for ESLint at the moment (I haven't seen anything in the RFCs at least), as it would a high complexity impact on the project (I've written about some tradeoffs in this post). But if this proposal were to be implemented without consideration for multifile analysis—which seems to be the case currently—then the cost of implementing it later would skyrocket and I imagine would lead it to never be implemented. I'm looking forward to see how this evolves, as I have unfortunately not figured out multi-threading well enough for this task to even try implementing it for |
The only way to parallelise and efficiently maintain cross-file analysis is with shared memory. Unfortunately in JS as a whole this is nigh-impossible with the current state of the world. Sharing memory via The shared structs proposal would go a long way in enabling shared memory models and is currently at stage 2 -- so there is some hope for this in the relatively near future! I know the TS team is eagerly looking forward to this proposal landing in node so they can explore parallelising TS's type analysis. For now at least the best way to efficiently do parallelised multi-file analysis is to do some "grouping aware" task splitting. I.e. instead of assigning files to threads randomly you instead try to keep "related" files in the same thread to minimise duplication of data across threads. But this is what I was alluding to in my above comments [1] [2] -- there needs to be an explicit decision encoded in this RFC:
The former is "the easy route" for obvious reasons -- there's a lot to think about and work through for the latter. As a quick-and-dirty example that we have discussed before (see eslint/eslint#16819): Just to reiterate my earlier comments -- I'm 100% onboard with the going with the former decision and ignoring the cross-file problem. I just want to ensure that this case has been fully considered and intentionally discarded, or that the design has a consideration to eventually grow the parallelism system to support such usecases. |
I think what we're going for here is effectively a "stop the bleeding" situation where we can get ESLint's current behavior to go faster, as this becomes an even bigger problem as people start to use ESLint to lint their JSON, CSS, and Markdown files, significantly increasing the number of files an average user will lint. I'm on board with discounting the cross-file problem at this point, as I think it's clear that many companies have created their own concurrent linting solutions built on top of ESLint that also discount this issue. I would like to revisit cross-file linting at some point in the future, but before we can do that, we really need to get the core rewrite in progress. |
|
||
**[Trunk Code Quality](https://trunk.io/code-quality)** | ||
|
||
Trunk manages to parallelize ESLint and other linters by splitting the workload over multiple processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean it's splitting up the file list and spreading across multiple processes? Or just one process for ESLint and other processes for other tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file list is being split in chunks of a predefined size and I can see that multiple threads are also being spawn in a process, but not sure what each thread is doing. I will look deeper into the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a closer look. Trunk is splitting the file list across multiple processes (4 at a time on my machine). The processes are Node.js processes with bin/eslint.js
as a main module, so they are not shared with other tools. When linting the main repo with Trunk, a whole command line for an ESLint process looks like this:
/usr/local/bin/node \
./bin/eslint.js \
--output-file \
/var/folders/0v/zllmqhv13d3013k10h417fwh0000gn/T/trunk-501/KSAgF4/QBGvrBgLTw8b \
--format \
json \
Makefile.js \
bin/eslint.js \
conf/default-cli-options.js \
conf/ecma-version.js \
conf/globals.js \
eslint.config.js \
lib/api.js \
lib/cli-engine/cli-engine.js \
lib/cli-engine/file-enumerator.js \
lib/cli-engine/formatters/html.js \
lib/cli-engine/formatters/json-with-metadata.js \
lib/cli-engine/formatters/json.js \
lib/cli-engine/formatters/stylish.js \
lib/cli-engine/hash.js \
lib/cli-engine/index.js \
lib/cli-engine/lint-result-cache.js \
lib/cli-engine/load-rules.js \
lib/cli.js \
lib/config/config-loader.js \
lib/config/config.js \
lib/config/default-config.js \
lib/config/flat-config-array.js \
lib/config/flat-config-helpers.js \
lib/config/flat-config-schema.js \
lib/config/rule-validator.js \
lib/eslint/eslint-helpers.js \
lib/eslint/eslint.js \
lib/eslint/index.js \
lib/eslint/legacy-eslint.js \
lib/languages/js/index.js \
lib/languages/js/source-code/index.js \
lib/languages/js/source-code/source-code.js \
lib/languages/js/source-code/token-store/backward-token-comment-cursor.js \
lib/languages/js/source-code/token-store/backward-token-cursor.js \
lib/languages/js/source-code/token-store/cursor.js \
lib/languages/js/source-code/token-store/cursors.js \
lib/languages/js/source-code/token-store/decorative-cursor.js \
lib/languages/js/source-code/token-store/filter-cursor.js \
lib/languages/js/source-code/token-store/forward-token-comment-cursor.js \
lib/languages/js/source-code/token-store/forward-token-cursor.js \
lib/languages/js/source-code/token-store/index.js \
lib/languages/js/source-code/token-store/limit-cursor.js \
lib/languages/js/source-code/token-store/padded-token-cursor.js \
lib/languages/js/source-code/token-store/skip-cursor.js \
lib/languages/js/source-code/token-store/utils.js \
lib/languages/js/validate-language-options.js \
lib/linter/apply-disable-directives.js \
lib/linter/code-path-analysis/code-path-analyzer.js \
lib/linter/code-path-analysis/code-path-segment.js \
lib/linter/code-path-analysis/code-path-state.js \
lib/linter/code-path-analysis/code-path.js \
lib/linter/code-path-analysis/debug-helpers.js \
lib/linter/code-path-analysis/fork-context.js \
lib/linter/code-path-analysis/id-generator.js \
lib/linter/file-context.js \
lib/linter/index.js \
lib/linter/interpolate.js \
lib/linter/linter.js \
lib/linter/node-event-generator.js \
lib/linter/report-translator.js \
lib/linter/rule-fixer.js \
lib/linter/rules.js \
lib/linter/safe-emitter.js \
lib/linter/source-code-fixer.js
The Node.js processes are created by the trunk
executable (located at ~/.cache/trunk/cli/1.22.3-darwin-arm64
on my Mac), for which no source code is available. As previously mentioned, the trunk
executable also spawns several threads, but these are probably something different not related to ESLint. I'm going to update the RFC.
Thanks for the feedback @jfmengels. In fact multifile analysis or project-aware analysis is not a concept we have implemented in the ESLint core at this time, which is the reason why it's not covered it in this RFC.
Do you think it would be too difficult to have multifile analysis and multithread linting at the same time? Or are you suggesting that implementing multifile analysis before multithread linting would be easier than the other way around? If you could clarify your concern we could add that in the drawbacks section for further consideration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good and I love the level of detail. Just left a few questions throughout.
|
||
When `auto` concurrency is selected, ESLint will use a heuristic to determine the best concurrency setting, which could be any number of threads or also `"off"`. | ||
How this heuristic will work is an open question. | ||
An approach I have tested is using half the number of available CPU cores, which is a reasonable starting point for modern machines with many (4 or more) cores and fast I/O peripherals to access the file system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just start by using this heuristic with the docs stating that you may get better performance by manually setting the concurrency level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, I've updated the RFC. We can always improve the heuristic later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing (using processes rather than threads) I've found it depends on whether the machine is hyperthreaded, the ratio of "p-cores" and "e-cores", and more, so I agree just calling out that setting the value can improve performance is the best approach rather than investing too much in a heuristic. Contextually, in another project I found that the best performance across all the machines I could try varied between 50% and 80% of the available cores, but that's also still dependent on file IO and the kernel's ability to read large files that quickly.
|
||
```js | ||
if (!options[disableSerializabilityCheck]) { | ||
const unserializableKeys = Object.keys(options).filter(key => !isSerializable(options[key])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially faster to just run JSON.stringify()
and catch any errors (faster than structuredClone()
, which is creating objects as it goes). If there are errors, only then do we check which keys are causing the problem. I'm just not sure if we want to pay this cost 100% of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we can't use JSON.stringify
because it just skips unserializable properties. For example, JSON.stringify({ foo: () => {} })
creates an empty object.
But I think it's still a good idea to try serializing the whole object first and only check the single properties when needed.
Also, the idea is to run the check only when concurrency
is specified. I've clarified that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the replacer option to check for functions:
JSON.stringify(value, (key, value) => {
if (typeof value === "function") {
throw new TypeError("Function!");
}
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions are not the only unserializable values, the spec lists many cases of values that can't be serialized and would throw a DataCloneError
. So this method of pre-checking properties individually will at most spot values that are obviously unserializable, but we'd still need another check to ensure that the options object can be passed to worker threads without throwing an exception.
const abortController = new AbortController(); | ||
const fileIndexCounter = new Int32Array(new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT)); | ||
const workerPath = require.resolve("./worker.js"); | ||
const workerOptions = { | ||
workerData: { | ||
filePaths, | ||
fileIndexCounter, | ||
eslintOptions | ||
}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about passing all file paths to every thread, as this is a lot of duplicated memory. Thinking about a situation where there are 10,000 files to be linted (which we have received reports of), that means we'd have 10,000 * thread_count stored in memory.
I wonder about an alternative approach where each thread is seeded with maybe 5-10 file paths (or maybe just one?) that it's responsible for linting. When they are all linted, it sends a message asking for more. I know this creates more chatter, but I'm wondering if it might end up being more memory-efficient in the long-run?
Any insights into how other tools handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could certainly compare this approach to other implementations. But really, 10,000 file paths do not take more than a few megabytes in memory. Even with, say, 32 threads, or one per CPU, that's still a totally manageable size. Also, if the shared structs proposal mentioned in the above comments is adopted in Node.js, memory duplication will no longer be a concern. I'm still planning on looking into other tools like Ava or Mocha, so I'll be sure to check how they handle sessions with many files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When they are all linted, it sends a message asking for more. I know this creates more chatter, but I'm wondering if it might end up being more memory-efficient in the long-run?
This is how I've implemented parallelism, at least, and I took inspiration for that from other implementations I saw as well. I mentioned this in my other comments in this section, but batching has other benefits as well for re-enabling result streaming, allowing for lazy initialization of workers based on the project file count to make sure it actually runs more efficiently than just a single thread would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally went with a batching approach in my implementation at Canva to avoid the chatter. At our scale there's ~90k files to lint and CI machines have ~30 cores -- so messaging would create a lot of chatter back-and-forth.
I found that there is some wasted perf here in that some threads finish before others -- but it's minimal -- maybe could get a 1-2% runtime reduction with something better.
In an ideal world you would implement a work-stealing approach (batch into COUNT / THREADS
, distribute, and then when a thread finishes its batch it looks at what threads have work left and takes a number of items off of that thread's queue). But doing this without shared memory is hard because it involves messaging other threads to ask what they've got left (which they have to stop processing to answer).
You could potentially do a hybrid approach and instead of batching into COUNT / THREADS
instead go a little more granular (eg COUNT / THREADS / n
) so that you have big batches and are less likely to starve a thread's queue.
There are many options!
Another possible solution is retrieving rules `meta` objects in each worker thread and returning this information to the main thread. | ||
When `getRulesMetaForResults()` is called in the main thread, rules `meta` objects from all threads will be deduped and merged and the results will be returned synchronously. | ||
|
||
This solution removes the need to load config files in the main thread but it still requires worker threads to do potentially useless work by adding an extra processing step unconditionally. | ||
Another problem is that rules `meta` objects for custom rules aren't always serializable. | ||
In order for `meta` objects to be passed to the main thread, unserializable properties will need to be stripped off, which is probably undesirable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach. Do you have an example of when a rule uses unserializable values in meta
?
Overall, I think it's safe for us to assume that meta
is always serializable and deal with any outliers through documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. I've checked typescript-eslint
, eslint-plugin-unicorn
, eslint-plugin-n
, eslint-plugin-import
and eslint-plugin-react
and none has rules with unserializable metadata. We could mention in the docs that unserializable properties in rule metadata will be silently ignored when getRulesMetaForResults()
is called in multithread linting mode. I'll update the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An open question is how to handle results that were not created by the current ESLint
instance. In those cases, the current implementation doesn't find a config for the file and throws an error:
https://github.com/eslint/eslint/blob/main/lib/eslint/eslint.js#L626-L636
But if we don't load config files in the main thread, we won't be able to run this check, so can only look at the filename and check if it's one of the known ones. If the filename is not known, I can think of two approaches:
- Return an empty object
{}
. Results that were not created by the current instance will not throw errors. - Throw an error. Results that were not created by running
lintFiles
but still have a filename that's associated with a config file will start to throw an error.
I think that 2. makes more sense, but it will throw errors in situations where the current implementation doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth the overhead of loading config files for each file in the results array. I think in most cases we'd be looking at just one config file, and in the worst case probably not more than a dozen.
Another option would be to have the threads pass rule meta back to the main thread. These should be serializable, but of indeterminate size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To load config files for each file in the results array, we could use solution 1. This is also the approach with the best compatibility.
To have the threads pass rule meta back to the main thread would be solution 2.
I tested both solutions on a few projects and indeed it seems that the overhead for loading the config(s) in the main thread is negligible. Even with typescript-eslint I found no significant performance difference. I guess that typescript-eslint loads TypeScript only when it is first needed, so there is no penalty in loading the configs arrays only to check files and ignores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a list of pros and cons to all suggested solutions so it's easier to make a decision. I understand that solution 3 is less interesting than the other two?
|
||
Errors created in a worker threads cannot be cloned to the main thread without changes, because they can contain unserializable properties. | ||
Instead, Node.js creates a serializable copy of the error, stripped off of unserializable properties, and reports it to the main thread as a paremeter of the [`error`](https://nodejs.org/docs/latest-v18.x/api/worker_threads.html#event-error) event. | ||
During this process `message` and `stack` are preserved because they are strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this preserve all serializable properties? My main concern is the messageTemplate
property that we add to produce readable error messages: https://github.com/eslint/eslint/blob/8bcd820f37f2361e4f7261a9876f52d21bd9de8f/bin/eslint.js#L80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all serializable properties are preserved.
I don't know that it is too hard, but it's mostly that having each thread analyze a portion of of the project won't work, at least with the multifile analysis approach I've chosen for For my use-case, it's more likely that analysis can be split by rule, instead of by file. i.e. thread 1 will run rules 1 and 2, thread 2 will run rule 3 and 4, etc. But this means that memory-wise, either the project's contents (and derived data) needs to be stored on every thread (multiplying the memory), which doesn't sound great for performance. Maybe this would get improved with the shared structs proposal. Right now, both multifile analysis and multithreading are do-able in isolation, but doing both requires a lot more thinking and maybe additional tools. Therefore doing one may exclude the other. But if you ever figure it out I'll definitely be interested to hear about it! |
@jfmengels I've added a section Multifile Analysis to the RFC. |
It's entirely possible to have both - one just needs to be "smart" about how files are distributed amongst threads. As I've mentioned before -- any "random" distribution1 approach will cause large duplication of work (and memory usage). If you are "smart" about how files are distributed -- eg use the dependency graph to distribute them to minimise the number of edges spanning threads -- then you can minimise the duplication for a net positive benefit. Computing a dependency graph ahead of time is going to take time so it's not necessarily a good heuristic -- but it is an example. As an example for TypeScript projects one could very quickly compute the set of files included in all relevant Another example of ways you can be smart is if ESLint builds in stateful primitives to make it easy for rules and parsers to store and access data across threads. As an example -- most of It's all possible -- it just requires some thought and careful design to make it work well. Footnotes
|
An additional constraint on multi-file analysis and multi-threading, that I hadn't remembered to bring up is automatic fixes. If you a module A that imports B, you may potentially need to fix B after having visited both (to remove an unused exported function for instance). At least the way that Maybe there's a good way to synchronize all of this, but something naive won't cut it I thinkg. And shared structs might be key here. * elm-review is more cautious applying fixes than ESLint. We apply one fix then re-analyze what has changed and then apply another fix 🔁, we don't apply multiple fixes at once and hope for the best like ESLint does AFAIR. For this proposal at least, if files are only read by a single thread, fixes should not be a concern 👍 (PS: Somewhat related, I just released multi-file fixes for |
I think we should table any discussion of multi-file analysis for now. That's going to require significant changes to the core and we'll likely have to revisit a lot of how things work, including concurrent linting. It's enough to just leave this as a note in the RFC. |
I've added a new section "Warning Management" because ESLint v9.20.0 adds two new warning types (for inactive flags and for empty config files) that require some adaptation to work well in multithread mode. |
@@ -508,6 +508,45 @@ This strategy comes with some unobvious implications that consumers should be aw | |||
* Errors thrown by `lintFiles()` could be of a different type than the errors thrown when running in single-thread mode, and they may leak unserializable properties. | |||
* Because of race conditions, subsequent calls to `lintFiles()` could produce different errors for identical input. | |||
|
|||
### Warning Management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would be helpful to extract warning management into a separate object rather than using process.emitWarning()
directly. We could then have Linter
use a default warning object that stubs out the relevant method(s) rather than emitting to console. We could then pass the "live" version to Linter
from ESLint
. Maybe this could simplify the thread behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, I've updated the RFC. The names for the new option and methods are just placeholders, so feel free to suggest better names.
@mdjermanovic can you take a look at this? @faultyserver given your experience, would you mind reviewing this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm very intrigued by the approach in this proposal. I think in the right environment, shared memory and lightweight worker threads can possibly give a greater performance boost than the process-based approach i've used, but I also believe smart batching can be a helpful feature for supporting multifile analysis cases like typescript-eslint and streaming results. But processes also have their advantages for resource usage and requiring less configuration from the user (re UV_THREADPOOL_SIZE
, again. A test on an 8-core machine with the 50% heuristic won't hit the contention of the default 4
pool size).
Some of the comments here are addressed later down in the proposal, but I'm leaving them for posterity as addressing thoughts I had reading the proposal top to bottom and learning as I went.
|
||
When `auto` concurrency is selected, ESLint will use a heuristic to determine the best concurrency setting, which could be any number of threads or also `"off"`. | ||
How this heuristic will work is an open question. | ||
An approach I have tested is using half the number of available CPU cores, which is a reasonable starting point for modern machines with many (4 or more) cores and fast I/O peripherals to access the file system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing (using processes rather than threads) I've found it depends on whether the machine is hyperthreaded, the ratio of "p-cores" and "e-cores", and more, so I agree just calling out that setting the value can improve performance is the best approach rather than investing too much in a heuristic. Contextually, in another project I found that the best performance across all the machines I could try varied between 50% and 80% of the available cores, but that's also still dependent on file IO and the kernel's ability to read large files that quickly.
When `auto` concurrency is selected, ESLint will use a heuristic to determine the best concurrency setting, which could be any number of threads or also `"off"`. | ||
An approach I have tested is using half the number of available CPU cores, which is a reasonable starting point for modern machines with many (4 or more) cores and fast I/O peripherals to access the file system. | ||
But linting time also depends on the features of the project like the configuration or the number of files. | ||
Some plugins like typescript-eslint with type-aware linting can increase the time required to initialize a linting thread resulting in additional per-thread overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This especially is a concern with startup for TypeScript. On an M2 Max, the fastest concurrency result for our 20k file project is actually only 4 processes, because the flood of file IO at the start just stalls the kernel out and it takes longer to finish that than the entire linting process does with fewer processes. There's very little actual CPU load that happens before lint rules start running, in my experience, its almost all just IO and memory allocation. On our 8-core CI machines, we also are only able to run 2 concurrent workers because of memory usage and the slower filesystem on those machines.
Also worth noting that the only reason this actually causes a problem for typescript-eslint is because typescript does file discovery first and reads all files into memory before starting any actual type work. The type linting itself is relatively fast in comparison.
|
||
The new multithread implementation will exist alongside the current, single-thread implementation, and it will only be used when multithreading mode is enabled. | ||
|
||
In the multithread implementation `lintFiles()` launches as many worker threads as specified by the `concurrency` option, but no more than the number of enumerated files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a strong lower bound for when the cost of initializing a worker thread/process and communicating file paths becomes an actual improvement over just doing it all in one thread. In the original parallel implementation I wrote for Discord, this is addressed by working in batches and lazily creating workers as batches are sent for processing. In other implementations, I also found this lower bound to be around 50 files or so, where the number of rules being applied ended up being effectively irrelevant.
Even using shared memory and lightweight worker threads, I think particularly small projects will still possibly run faster on a single thread rather than paying that initialization and communication cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something worth accounting for in the heuristic that calculates the number of threads with "auto"
concurrency. Given
The controlling thread itself does not lint any files: it waits until all files are linter or an error occurs. | ||
When a worker thread terminates successfully it submits a list of `LintReport`s to the controlling thread. Each result is enriched with the index of the associated file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit of batched work, and from running ESLint in general, is that lints effectively stream their output as they occur (unless i'm misremembering something, but that was one of the tenets i wanted to reproduce in my parallel implementations). Losing that streaming nature and only outputting results after everything has finished feels like a loss, even if it's a trivial one (i.e., the results will show, and they'll also come back faster than they did before).
This does introduce a higher IPC cost, though, and is really where the tradeoff of batching comes in. I think it'd be worth leaving this somewhat open-ended and experimenting with different approaches here to see what's performance and what feels best, especially on larger projects that are going to be taking a while to run and really benefitting from this parallelism in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage of sending results when the worker thread completes, rather than as the results are produced, consists in the ability to post a single large message instead of multiple smaller ones. In the suggested implementation, the controlling thread does nothing with the results until all of them have been collected, hence submitting a part results to the main thread in advance would not introduce a benefit. That said, switching to streaming results as they are created should be an easy change, so we could test that during the implementation.
In the multithread implementation `lintFiles()` launches as many worker threads as specified by the `concurrency` option, but no more than the number of enumerated files. | ||
All worker threads receive the same parameters from the controlling thread: | ||
* the list of enumerated file paths | ||
* a reference to shared memory containing the index of the next file to be processed in the list (a `TypedArray` with an underlying `SharedArrayBuffer`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a shared buffer here limits the options for parallelism to only using worker threads. In my experience, processes ended up being more efficient than threads. My current belief is that even though each thread has its own event loop, they all end up on the same libuv process that gets limited by the concurrency Node sets up for that internally (using UV_THREADPOOL_SIZE
, which is only 4 by default. This article gives some good info on this). Using multiple processes frees up all of the concurrency and gives "more headroom" through a higher total working thread count by default.
But in the situation of a user just running new Eslint({concurrency: 'auto'})
on a 16-core machine without also setting UV_THREADPOOL_SIZE
to at least 8 (more likely 11 to allow the main thread and IO events to be handled outside of the worker thread queues), they're going to get bottlenecked and have much slower results (in my experience, like 80-120% worse than using separate processes).
I think the way to address this is either a) use processes and send files in batches over IPC or b) indicate strongly to users that they will likely need to set the UV_THREADPOOL_SIZE
variable to something greater than their requested concurrency to get the best results.
Option a
feels better to me from a usability standpoint (i.e., don't tell the user to change their platform when a better internal solution that "just works" exists), but that's a tradeoff that I think will need addressing, and relying on a shared memory buffer limits the ability to change that in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current belief is that even though each thread has its own event loop, they all end up on the same libuv process that gets limited by the concurrency Node sets up for that internally (using UV_THREADPOOL_SIZE, which is only 4 by default. This article gives some good info on this). Using multiple processes frees up all of the concurrency and gives "more headroom" through a higher total working thread count by default.
This has been my experience as well. The default async worker thread of 4 greatly limits your available disk reads and bottlenecks things. In my own experience writing a parallelisation wrapper at my company I used node:cluster
and it greatly improved perf over node:worker_thread
. In my case we found about a 30% improvement by using cluster
.
For reference Canva's codebase is linting ~90k files with ~120 rules (no type-aware linting). Though we initially were concerned about contention so we limited concurrency to a max of 8 workers (regardless of total CPU count).
Eventually we further improved that perf by spawning n-1
workers (on our CI machines that meant 31 instances of NodeJS) and that netted a further 50% improvement in lint time on our CI machines.
For us right now P70 lint time is ~5m. Not fast by any means, but it's definitely bearable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we manually set UV_THREADPOOL_SIZE
to the number of logical cores?
const os = require('os');
process.env.UV_THREADPOOL_SIZE = os.cpus().length
In my experience, the fewer asks of the end user, the better life is for everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ideal value is really hard to even guess at accurately, again just based on the performance vs efficiency core distribution, the ability of any hyperthreads, the other system IO bottlenecks, etc.
I agree it's nicest to not have to ask for the user to change things here, but at a minimum it definitely needs to be overridable, like using ??=
to preserve whatever value might be set from the caller.
Even if there is a default applied internally, I think it's worth documenting that this env var exists and will likely improve performance by tuning it, along with some information about what parameters meaningfully affect the best value to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I think we can probably make this a better user experience by creating a command-line argument to set it rather asking someone to set it manually (which would then make it difficult to set internally).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running my fork with different settings for UV_THREADPOOL_SIZE
but didn't notice any difference – only the number of threads seems to impact performance. These are commands I typed in a zsh console of a MacBook. The last three of them run in around the same amount of time:
npm i -D eslint-p
time ./node_modules/.bin/eslint-p --concurrency=4
time UV_THREADPOOL_SIZE=1 ./node_modules/.bin/eslint-p --concurrency=4
time UV_THREADPOOL_SIZE=8 ./node_modules/.bin/eslint-p --concurrency=4
Is there a way to test the effect of the UV_THREADPOOL_SIZE
in a comparable tool (one that uses multithreading to speed up a task)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running my fork with different settings for UV_THREADPOOL_SIZE but didn't notice any difference – only the number of threads seems to impact performance
If you're not doing enough FS ops to saturate the async threadpool then it wouldn't make much difference making it bigger.
Additionally depends on the size of your test codebase. For example if you've got like 1k files then it's likely you won't see much of a difference in total runtime because the total time reading from disk is small (relative to the entire lint run) which consequently means that the total potential blocking time would smaller still.
Anecdotally I personally haven't hit these sorts of threadpool perf problems in any meaningful way until I was dealing with 10's of thousands of files -- at scale it makes a noticeable difference. At scale you are saturating the thread pool more and for longer which means more chance of blocking time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bradzacher! Do you happen to know a GitHub repo with tens of thousands of files that uses ESLint 9? It would be great to have some reference projects everyone can use to run tests and compare the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up rewriting our internal parallelizer to use the same approach from this proposal (a shared index pointer and worker threads using that index) to test and got the same results as you, even on our large project (~20k files): changing UV_THREADPOOL_SIZE
above 4
had no effect, but changing it down to 2
did slow things down noticeably. So I'm no longer sure this is a relevant suggestion, and even through investigating I couldn't figure out what was actually changing when I changed the value, other than just spawning more threads that possibly didn't do anything.
Interestingly, though, I also ran this with just typescript-eslint rules enabled on our project, and it ended up using a lot more memory, but i think that's because the lack of directory-based file batching logic meant each thread effecitvely had to check the entire project rather than just a subset of it. It also ran ~30% slower, but i think at least part of that can be attributed to the less efficient type caching.
UV_THREADPOOL_SIZE
also had minimal effect here (on my 12-core M2 Max), so I don't believe that's an issue anymore.
|
||
The controlling thread itself does not lint any files: it waits until all files are linter or an error occurs. | ||
When a worker thread terminates successfully it submits a list of `LintReport`s to the controlling thread. Each result is enriched with the index of the associated file. | ||
The controlling thread collects the lists of `LintReport`s and merges them in the expected order, then populates `lintResultCache` with cache information (**note**: this is different from the single-thread implementation where the cache is populated progressively as files are being linted). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merges them in the expected order
I presume this is part of the reason for waiting and batching results rather than streaming them. Is "expected order" something that ESLint guarantees? Or requires preserving in a meaningful way? I could imagine if the user passes in a concrete list of files that possibly they'd want to see the results in exactly that order, but personally I also think this does not matter, and I don't believe any of the existing parallelism implementations attempt to preserve this behavior either (i.e., a result contains the file name and location, so it doesn't matter if its index matches the original file list order).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "expected order" something that ESLint guarantees? Or requires preserving in a meaningful way? I could imagine if the user passes in a concrete list of files that possibly they'd want to see the results in exactly that order
The order of lint results depends on how files are enumerated before linting starts. This does not necessarily reflect the order specified by a user, but it's stable in the sense that it doesn't change across executions. For multithread mode, the order of lint results will be the same as for single-thread mode.
const abortController = new AbortController(); | ||
const fileIndexCounter = new Int32Array(new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT)); | ||
const workerPath = require.resolve("./worker.js"); | ||
const workerOptions = { | ||
workerData: { | ||
filePaths, | ||
fileIndexCounter, | ||
eslintOptions | ||
}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When they are all linted, it sends a message asking for more. I know this creates more chatter, but I'm wondering if it might end up being more memory-efficient in the long-run?
This is how I've implemented parallelism, at least, and I took inspiration for that from other implementations I saw as well. I mentioned this in my other comments in this section, but batching has other benefits as well for re-enabling result streaming, allowing for lazy initialization of workers based on the project file count to make sure it actually runs more efficiently than just a single thread would.
For the purpose of multithread linting, all linting threads are created by the controlling thread at the same time before the linting process starts. | ||
Although the controlling thread will be typically the main thread of the process, this is not a general requirement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned above, but noting here as well: I believe the real worker count should be informed by the size of the enumerated file list. worker threads are much lighter than full processes, but I believe they still have some cost that means default behavior on small projects will be slower than optimal. This could be tested once an implementation exists to know for sure, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed yes, there is no point in having more worker threads than there are files to be linted. In "auto"
concurreny mode, the number of worker threads could be limited to one worker thread each 50 files, for example. I would like to come up with a proto-implementation to determine the number of effective threads, even if the logic could change anytime.
Especially plugins that require a longer initialization time, such as typescript-eslint, will be affected by this problem. | ||
While several solutions have been proposed to mitigate this issue, discussing them would exceed the scope of this proposal. | ||
To avoid establishing false expectations, it would make sense to include a note regarding the potential performance degradation in the documentation or perhaps to issue a runtime warning. | ||
The recommendation will be to enable multithread linting only when it performs measurably faster than linting in single-thread mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting again from my comments above, a critical piece of this for users on large machines with large projects and using worker_threads may be ensuring that UV_THREADPOOL_SIZE
is set high enough to avoid contention.
**https://github.com/discord/eslint/tree/parallel by [faultyserver](https://github.com/faultyserver)** | ||
|
||
This is a fork of ESLint v8 with parallel linting support.[^4] | ||
It launches multiple sub-processes where each sub-process lints a precomputed subset of files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not included in the public fork we (Discord) have available is improved batching:
Each subprocess lints a batch of files, where the controlling process manages creating "effective" batches and adding each batch to a queue that is sent to workers as they become available (i.e., a naive form of backpressure handling). Specifically, it batches using a heuristic that includes total batch size (keep adding files until at least 50 are present) and directory structure (add all files that are direct children of a directory to a batch at once, even if it goes well over 50 files).
This yielded massive performance improvements for us, especially with typescript-eslint, since it benefits greatly from its internal caching when working with the same types over and over again, and that is highly correlated with the directory that a file lives in (for the most part, and for our projects specifically, at least).
Summary
This document proposes adding a new multithread mode for
ESLint#lintFiles()
. The aim of multithread linting is to speed up ESLint by allowing workload distribution across multiple CPU cores.Related Issues
eslint/eslint#3565