Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: multithread linting #129

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

feat: multithread linting #129

wants to merge 19 commits into from

Conversation

fasttime
Copy link
Member

Summary

This document proposes adding a new multithread mode for ESLint#lintFiles(). The aim of multithread linting is to speed up ESLint by allowing workload distribution across multiple CPU cores.

Related Issues

eslint/eslint#3565

@bradzacher
Copy link

bradzacher commented Dec 20, 2024

I don't see any mention of the previous RFCs around parallelisation:

#42
#87

Both of these have a lot of context about the difficulties of parallelisation outside of the core rules - eg in cases where the parser or the rules store state in any form.
Two quick and prevalent examples:

  • eslint-plugin-import and friends store a cache of modules that has resolved and that it has parsed outside of the eslint cycle.
  • @typescript-eslint has type information produced by the parser and consumed by rules.

Naively parallelising by just "randomly" distributing files across threads may lead to a SLOWER lint run in cases where people use such stateful plugins because the cached work may need to be redone once for each thread.

I would like to see such usecases addressed as part of this RFC given that these mentioned usecases are very prevalent - with both mentioned plugins in use by the majority of the ecosystem.

These problems have been discussed before in the context of language plugins and parser contexts (I can try to find the threads a bit later).

@fasttime
Copy link
Member Author

Thanks for the input @bradzacher. How would you go about incorporating context from #42 and #87 into this RFC?

I see that #42 suggests introducing a plugin setting disallowWorkerThreads and also limiting the number of threads depending on the number of files. Those both measures could be actually useful when the concurrency is calculated automatically. Do you think that would be helpful?

As for #87, it seems about an unrelated feature that doesn't even require multithreading. But I get why it would be beneficial to limit the number of instances of the same parser across threads, especially if the parser takes a long time to load its initial state, like typescript-eslint with type-aware parsing. If you have any concrete suggestions on how to do that, I'd love to know.

Naively parallelising by just "randomly" distributing files across threads may lead to a SLOWER lint run in cases where people use such stateful plugins because the cached work may need to be redone once for each thread.

I would like to see such usecases addressed as part of this RFC given that these mentioned usecases are very prevalent - with both mentioned plugins in use by the majority of the ecosystem.

I imagine the way one would address such use cases is by making no changes, i.e. not enabling multithread linting if the results are not satisfactory. But if something can be done to improve performance for popular plugins that would be awesome.

@bradzacher
Copy link

To be clear - I'm all for such a system existing. Like caching it can vastly improve the experience for those that fit within the bounds.

The thing I want to make sure of is that we ensure the bounds are either intentionally designed to be tight to avoid complexity explosion, or that we are at least planning a path forward for the mentioned cases.


#87 has some discussions around parallel parsing which are relevant to the sorts of ideas we'd need to consider here.

Some other relevant discussions can be found in
eslint/eslint#16819
eslint/eslint#16818 (some concepts discussed are semi-relevant here)
eslint/eslint#16557 (comment) (and other threads in that discussion)
eslint/eslint#14139 (some more relevant context about plugin setup)

I'm pretty swamped atm cos holiday season and kids and probably won't be able to get back to this properly until the new year.

@nzakas
Copy link
Member

nzakas commented Dec 31, 2024

Thanks for putting this together. I'm going to need more time to dig into the details, and I really appreciate the amount of thought and explanation you've included in this RFC. I have a few high-level thoughts from reviewing this today:

  1. I'd like to see an exploration of how other tools handle concurrency. I know other tools aren't a one-to-one comparison with ESLint, but there are plenty of tools in the ecosystem that do concurrency. For instance, Jest was very early to implement concurrency, so it would be good to include how they do it. Ava also runs tests concurrently. (Threads vs. processes, at what level, how do they determine defaults etc.)
  2. There have been a number of forks that implement parallelization of ESLint over the years. Even though some are old and eslintrc-based, I'd still like to see those included in this RFC with a summary of how each worked. There's a lot of history here, so we should be sure we're considering all past attempts. Here are a few:
  1. I'm wondering if implementing this as part of ESLint#lintFiles() is the correct abstraction? If each worker needs to create an instance of ESLint, then I wonder if perhaps a separate class that solely manages concurrency would make things a bit cleaner?

@fasttime
Copy link
Member Author

fasttime commented Jan 9, 2025

  1. I'd like to see an exploration of how other tools handle concurrency. I know other tools aren't a one-to-one comparison with ESLint, but there are plenty of tools in the ecosystem that do concurrency. For instance, Jest was very early to implement concurrency, so it would be good to include how they do it. Ava also runs tests concurrently. (Threads vs. processes, at what level, how do they determine defaults etc.)

Yes, it would be interesting to look into other tools to understand how they handle concurrency. This could actually bring in some interesting ideas even if the general approach is different. I was thinking to check Prettier but haven't managed to do that yet. Jest and Ava are also good candidates.

  1. There have been a number of forks that implement parallelization of ESLint over the years. Even though some are old and eslintrc-based, I'd still like to see those included in this RFC with a summary of how each worked. There's a lot of history here, so we should be sure we're considering all past attempts. Here are a few:

Thanks for the list. I missed most of those links while skimming through the discussion in eslint#3565. I'll be sure to go through the items and add a prior art mention.

  1. I'm wondering if implementing this as part of ESLint#lintFiles() is the correct abstraction? If each worker needs to create an instance of ESLint, then I wonder if perhaps a separate class that solely manages concurrency would make things a bit cleaner?

Workers don't need to create a new instance of ESLint each. Only the constructor options and the list of files must be known in each thread, so it makes perfect sense to keep the runtime logic in a separate module/class. I will emphasize this in the wording.

@jfmengels
Copy link

One thing I'd like to point out before it's too late and just in case it's relevant: multithreading makes multifile analysis harder. If there ever comes a system where a single rule can look at the contents of multiple files—as implemented in elm-review, or as planned in Biome)—then splitting the analysis across several threads means it's likely several threads will need to load all of the files in the project, increasing at least the IO load for the thread.

I imagine this kind of analysis is not really in the scope for ESLint at the moment (I haven't seen anything in the RFCs at least), as it would a high complexity impact on the project (I've written about some tradeoffs in this post). But if this proposal were to be implemented without consideration for multifile analysis—which seems to be the case currently—then the cost of implementing it later would skyrocket and I imagine would lead it to never be implemented.

I'm looking forward to see how this evolves, as I have unfortunately not figured out multi-threading well enough for this task to even try implementing it for elm-review, where multifile analysis has worked wonders for the quality of the linting rules.

@bradzacher
Copy link

The only way to parallelise and efficiently maintain cross-file analysis is with shared memory. Unfortunately in JS as a whole this is nigh-impossible with the current state of the world. Sharing memory via SharedArrayBuffer is not efficient in JS due to the need to encode/decode things (you can't just tell JS "load this range of bytes as an object" and instead need to decode the bytes by hand -- cloning the data).

The shared structs proposal would go a long way in enabling shared memory models and is currently at stage 2 -- so there is some hope for this in the relatively near future! I know the TS team is eagerly looking forward to this proposal landing in node so they can explore parallelising TS's type analysis.

For now at least the best way to efficiently do parallelised multi-file analysis is to do some "grouping aware" task splitting. I.e. instead of assigning files to threads randomly you instead try to keep "related" files in the same thread to minimise duplication of data across threads.

But this is what I was alluding to in my above comments [1] [2] -- there needs to be an explicit decision encoded in this RFC:

  • Is it going to go the way of the cache system where it ONLY supports lint setups that purely use single-file analysis (thus locking out all users of eslint-plugin-import's cross-file analysis rules, typescript-eslint's type-aware rules, etc), or
  • Is it going to try to build in primitives to help support cross-file analysis setups (thus enabling said users to leverage parallelism).

The former is "the easy route" for obvious reasons -- there's a lot to think about and work through for the latter.

As a quick-and-dirty example that we have discussed before (see eslint/eslint#16819):
If ESLint included a mechanism to allow the user to specify a file grouping strategy for parallelism then typescript-eslint could build a strategy for users to configure that makes ESLint group files by tsconfig. This would then keep each project's type information in one thread and thus minimise duplicated work in calculating type information.
I have actually prototyped such a strategy in the form of a wrapper around ESLint (typescript-eslint/typescript-eslint#4359) and it showed some decent wins for certain setups.


Just to reiterate my earlier comments -- I'm 100% onboard with the going with the former decision and ignoring the cross-file problem. I just want to ensure that this case has been fully considered and intentionally discarded, or that the design has a consideration to eventually grow the parallelism system to support such usecases.

@nzakas
Copy link
Member

nzakas commented Jan 17, 2025

I think what we're going for here is effectively a "stop the bleeding" situation where we can get ESLint's current behavior to go faster, as this becomes an even bigger problem as people start to use ESLint to lint their JSON, CSS, and Markdown files, significantly increasing the number of files an average user will lint.

I'm on board with discounting the cross-file problem at this point, as I think it's clear that many companies have created their own concurrent linting solutions built on top of ESLint that also discount this issue.

I would like to revisit cross-file linting at some point in the future, but before we can do that, we really need to get the core rewrite in progress.


**[Trunk Code Quality](https://trunk.io/code-quality)**

Trunk manages to parallelize ESLint and other linters by splitting the workload over multiple processes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean it's splitting up the file list and spreading across multiple processes? Or just one process for ESLint and other processes for other tools?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file list is being split in chunks of a predefined size and I can see that multiple threads are also being spawn in a process, but not sure what each thread is doing. I will look deeper into the details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look. Trunk is splitting the file list across multiple processes (4 at a time on my machine). The processes are Node.js processes with bin/eslint.js as a main module, so they are not shared with other tools. When linting the main repo with Trunk, a whole command line for an ESLint process looks like this:

/usr/local/bin/node \
./bin/eslint.js \
--output-file \
/var/folders/0v/zllmqhv13d3013k10h417fwh0000gn/T/trunk-501/KSAgF4/QBGvrBgLTw8b \
--format \
json \
Makefile.js \
bin/eslint.js \
conf/default-cli-options.js \
conf/ecma-version.js \
conf/globals.js \
eslint.config.js \
lib/api.js \
lib/cli-engine/cli-engine.js \
lib/cli-engine/file-enumerator.js \
lib/cli-engine/formatters/html.js \
lib/cli-engine/formatters/json-with-metadata.js \
lib/cli-engine/formatters/json.js \
lib/cli-engine/formatters/stylish.js \
lib/cli-engine/hash.js \
lib/cli-engine/index.js \
lib/cli-engine/lint-result-cache.js \
lib/cli-engine/load-rules.js \
lib/cli.js \
lib/config/config-loader.js \
lib/config/config.js \
lib/config/default-config.js \
lib/config/flat-config-array.js \
lib/config/flat-config-helpers.js \
lib/config/flat-config-schema.js \
lib/config/rule-validator.js \
lib/eslint/eslint-helpers.js \
lib/eslint/eslint.js \
lib/eslint/index.js \
lib/eslint/legacy-eslint.js \
lib/languages/js/index.js \
lib/languages/js/source-code/index.js \
lib/languages/js/source-code/source-code.js \
lib/languages/js/source-code/token-store/backward-token-comment-cursor.js \
lib/languages/js/source-code/token-store/backward-token-cursor.js \
lib/languages/js/source-code/token-store/cursor.js \
lib/languages/js/source-code/token-store/cursors.js \
lib/languages/js/source-code/token-store/decorative-cursor.js \
lib/languages/js/source-code/token-store/filter-cursor.js \
lib/languages/js/source-code/token-store/forward-token-comment-cursor.js \
lib/languages/js/source-code/token-store/forward-token-cursor.js \
lib/languages/js/source-code/token-store/index.js \
lib/languages/js/source-code/token-store/limit-cursor.js \
lib/languages/js/source-code/token-store/padded-token-cursor.js \
lib/languages/js/source-code/token-store/skip-cursor.js \
lib/languages/js/source-code/token-store/utils.js \
lib/languages/js/validate-language-options.js \
lib/linter/apply-disable-directives.js \
lib/linter/code-path-analysis/code-path-analyzer.js \
lib/linter/code-path-analysis/code-path-segment.js \
lib/linter/code-path-analysis/code-path-state.js \
lib/linter/code-path-analysis/code-path.js \
lib/linter/code-path-analysis/debug-helpers.js \
lib/linter/code-path-analysis/fork-context.js \
lib/linter/code-path-analysis/id-generator.js \
lib/linter/file-context.js \
lib/linter/index.js \
lib/linter/interpolate.js \
lib/linter/linter.js \
lib/linter/node-event-generator.js \
lib/linter/report-translator.js \
lib/linter/rule-fixer.js \
lib/linter/rules.js \
lib/linter/safe-emitter.js \
lib/linter/source-code-fixer.js

The Node.js processes are created by the trunk executable (located at ~/.cache/trunk/cli/1.22.3-darwin-arm64 on my Mac), for which no source code is available. As previously mentioned, the trunk executable also spawns several threads, but these are probably something different not related to ESLint. I'm going to update the RFC.

@fasttime
Copy link
Member Author

One thing I'd like to point out before it's too late and just in case it's relevant: multithreading makes multifile analysis harder. If there ever comes a system where a single rule can look at the contents of multiple files—as implemented in elm-review, or as planned in Biome)—then splitting the analysis across several threads means it's likely several threads will need to load all of the files in the project, increasing at least the IO load for the thread.

Thanks for the feedback @jfmengels. In fact multifile analysis or project-aware analysis is not a concept we have implemented in the ESLint core at this time, which is the reason why it's not covered it in this RFC.

I imagine this kind of analysis is not really in the scope for ESLint at the moment (I haven't seen anything in the RFCs at least), as it would a high complexity impact on the project (I've written about some tradeoffs in this post). But if this proposal were to be implemented without consideration for multifile analysis—which seems to be the case currently—then the cost of implementing it later would skyrocket and I imagine would lead it to never be implemented.

Do you think it would be too difficult to have multifile analysis and multithread linting at the same time? Or are you suggesting that implementing multifile analysis before multithread linting would be easier than the other way around? If you could clarify your concern we could add that in the drawbacks section for further consideration.

Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good and I love the level of detail. Just left a few questions throughout.


When `auto` concurrency is selected, ESLint will use a heuristic to determine the best concurrency setting, which could be any number of threads or also `"off"`.
How this heuristic will work is an open question.
An approach I have tested is using half the number of available CPU cores, which is a reasonable starting point for modern machines with many (4 or more) cores and fast I/O peripherals to access the file system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just start by using this heuristic with the docs stating that you may get better performance by manually setting the concurrency level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, I've updated the RFC. We can always improve the heuristic later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing (using processes rather than threads) I've found it depends on whether the machine is hyperthreaded, the ratio of "p-cores" and "e-cores", and more, so I agree just calling out that setting the value can improve performance is the best approach rather than investing too much in a heuristic. Contextually, in another project I found that the best performance across all the machines I could try varied between 50% and 80% of the available cores, but that's also still dependent on file IO and the kernel's ability to read large files that quickly.


```js
if (!options[disableSerializabilityCheck]) {
const unserializableKeys = Object.keys(options).filter(key => !isSerializable(options[key]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially faster to just run JSON.stringify() and catch any errors (faster than structuredClone(), which is creating objects as it goes). If there are errors, only then do we check which keys are causing the problem. I'm just not sure if we want to pay this cost 100% of the time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we can't use JSON.stringify because it just skips unserializable properties. For example, JSON.stringify({ foo: () => {} }) creates an empty object.

But I think it's still a good idea to try serializing the whole object first and only check the single properties when needed.

Also, the idea is to run the check only when concurrency is specified. I've clarified that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the replacer option to check for functions:

JSON.stringify(value, (key, value) => {
    if (typeof value === "function") {
        throw new TypeError("Function!");
    }
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions are not the only unserializable values, the spec lists many cases of values that can't be serialized and would throw a DataCloneError. So this method of pre-checking properties individually will at most spot values that are obviously unserializable, but we'd still need another check to ensure that the options object can be passed to worker threads without throwing an exception.

Comment on lines +288 to +297
const abortController = new AbortController();
const fileIndexCounter = new Int32Array(new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT));
const workerPath = require.resolve("./worker.js");
const workerOptions = {
workerData: {
filePaths,
fileIndexCounter,
eslintOptions
},
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about passing all file paths to every thread, as this is a lot of duplicated memory. Thinking about a situation where there are 10,000 files to be linted (which we have received reports of), that means we'd have 10,000 * thread_count stored in memory.

I wonder about an alternative approach where each thread is seeded with maybe 5-10 file paths (or maybe just one?) that it's responsible for linting. When they are all linted, it sends a message asking for more. I know this creates more chatter, but I'm wondering if it might end up being more memory-efficient in the long-run?

Any insights into how other tools handle this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could certainly compare this approach to other implementations. But really, 10,000 file paths do not take more than a few megabytes in memory. Even with, say, 32 threads, or one per CPU, that's still a totally manageable size. Also, if the shared structs proposal mentioned in the above comments is adopted in Node.js, memory duplication will no longer be a concern. I'm still planning on looking into other tools like Ava or Mocha, so I'll be sure to check how they handle sessions with many files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When they are all linted, it sends a message asking for more. I know this creates more chatter, but I'm wondering if it might end up being more memory-efficient in the long-run?

This is how I've implemented parallelism, at least, and I took inspiration for that from other implementations I saw as well. I mentioned this in my other comments in this section, but batching has other benefits as well for re-enabling result streaming, allowing for lazy initialization of workers based on the project file count to make sure it actually runs more efficiently than just a single thread would.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally went with a batching approach in my implementation at Canva to avoid the chatter. At our scale there's ~90k files to lint and CI machines have ~30 cores -- so messaging would create a lot of chatter back-and-forth.

I found that there is some wasted perf here in that some threads finish before others -- but it's minimal -- maybe could get a 1-2% runtime reduction with something better.

In an ideal world you would implement a work-stealing approach (batch into COUNT / THREADS, distribute, and then when a thread finishes its batch it looks at what threads have work left and takes a number of items off of that thread's queue). But doing this without shared memory is hard because it involves messaging other threads to ask what they've got left (which they have to stop processing to answer).

You could potentially do a hybrid approach and instead of batching into COUNT / THREADS instead go a little more granular (eg COUNT / THREADS / n) so that you have big batches and are less likely to starve a thread's queue.

There are many options!

Comment on lines 462 to 467
Another possible solution is retrieving rules `meta` objects in each worker thread and returning this information to the main thread.
When `getRulesMetaForResults()` is called in the main thread, rules `meta` objects from all threads will be deduped and merged and the results will be returned synchronously.

This solution removes the need to load config files in the main thread but it still requires worker threads to do potentially useless work by adding an extra processing step unconditionally.
Another problem is that rules `meta` objects for custom rules aren't always serializable.
In order for `meta` objects to be passed to the main thread, unserializable properties will need to be stripped off, which is probably undesirable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach. Do you have an example of when a rule uses unserializable values in meta?

Overall, I think it's safe for us to assume that meta is always serializable and deal with any outliers through documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. I've checked typescript-eslint, eslint-plugin-unicorn, eslint-plugin-n, eslint-plugin-import and eslint-plugin-react and none has rules with unserializable metadata. We could mention in the docs that unserializable properties in rule metadata will be silently ignored when getRulesMetaForResults() is called in multithread linting mode. I'll update the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An open question is how to handle results that were not created by the current ESLint instance. In those cases, the current implementation doesn't find a config for the file and throws an error:

https://github.com/eslint/eslint/blob/main/lib/eslint/eslint.js#L626-L636

But if we don't load config files in the main thread, we won't be able to run this check, so can only look at the filename and check if it's one of the known ones. If the filename is not known, I can think of two approaches:

  1. Return an empty object {}. Results that were not created by the current instance will not throw errors.
  2. Throw an error. Results that were not created by running lintFiles but still have a filename that's associated with a config file will start to throw an error.

I think that 2. makes more sense, but it will throw errors in situations where the current implementation doesn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth the overhead of loading config files for each file in the results array. I think in most cases we'd be looking at just one config file, and in the worst case probably not more than a dozen.

Another option would be to have the threads pass rule meta back to the main thread. These should be serializable, but of indeterminate size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To load config files for each file in the results array, we could use solution 1. This is also the approach with the best compatibility.

To have the threads pass rule meta back to the main thread would be solution 2.

I tested both solutions on a few projects and indeed it seems that the overhead for loading the config(s) in the main thread is negligible. Even with typescript-eslint I found no significant performance difference. I guess that typescript-eslint loads TypeScript only when it is first needed, so there is no penalty in loading the configs arrays only to check files and ignores.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a list of pros and cons to all suggested solutions so it's easier to make a decision. I understand that solution 3 is less interesting than the other two?


Errors created in a worker threads cannot be cloned to the main thread without changes, because they can contain unserializable properties.
Instead, Node.js creates a serializable copy of the error, stripped off of unserializable properties, and reports it to the main thread as a paremeter of the [`error`](https://nodejs.org/docs/latest-v18.x/api/worker_threads.html#event-error) event.
During this process `message` and `stack` are preserved because they are strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this preserve all serializable properties? My main concern is the messageTemplate property that we add to produce readable error messages: https://github.com/eslint/eslint/blob/8bcd820f37f2361e4f7261a9876f52d21bd9de8f/bin/eslint.js#L80

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all serializable properties are preserved.

@jfmengels
Copy link

Do you think it would be too difficult to have multifile analysis and multithread linting at the same time? Or are you suggesting that implementing multifile analysis before multithread linting would be easier than the other way around?

I don't know that it is too hard, but it's mostly that having each thread analyze a portion of of the project won't work, at least with the multifile analysis approach I've chosen for elm-review and explained in this blog post, because rules needs to (potentially) analyze every file.

For my use-case, it's more likely that analysis can be split by rule, instead of by file. i.e. thread 1 will run rules 1 and 2, thread 2 will run rule 3 and 4, etc. But this means that memory-wise, either the project's contents (and derived data) needs to be stored on every thread (multiplying the memory), which doesn't sound great for performance. Maybe this would get improved with the shared structs proposal.

Right now, both multifile analysis and multithreading are do-able in isolation, but doing both requires a lot more thinking and maybe additional tools. Therefore doing one may exclude the other. But if you ever figure it out I'll definitely be interested to hear about it!

@fasttime
Copy link
Member Author

fasttime commented Feb 9, 2025

@jfmengels I've added a section Multifile Analysis to the RFC.

@bradzacher
Copy link

Do you think it would be too difficult to have multifile analysis and multithread linting at the same time?

It's entirely possible to have both - one just needs to be "smart" about how files are distributed amongst threads. As I've mentioned before -- any "random" distribution1 approach will cause large duplication of work (and memory usage).

If you are "smart" about how files are distributed -- eg use the dependency graph to distribute them to minimise the number of edges spanning threads -- then you can minimise the duplication for a net positive benefit. Computing a dependency graph ahead of time is going to take time so it's not necessarily a good heuristic -- but it is an example. As an example for TypeScript projects one could very quickly compute the set of files included in all relevant tsconfig.jsons and use that to inform distribution.

Another example of ways you can be smart is if ESLint builds in stateful primitives to make it easy for rules and parsers to store and access data across threads. As an example -- most of eslint-plugin-import's multifile work involves parsing each dependency of the currently linted file to determine things about that dependency's exports. This work is highly cacheable -- so if ESLint provided an API to allow the plugin to read from and write to a shared cache then they could deduplicate work at the cost of cross-thread messaging (which likely would be a perf win overall).
Note: such infra wouldn't help typescript-eslint as typescript type information is not currently serializable across threads.

It's all possible -- it just requires some thought and careful design to make it work well.

Footnotes

  1. By "random" I mean any distribution that puts files in threads without considering file structure, project structure, or file contents, etc. For example - partitioning the set of linted files into n buckets would be a "random" distribution even if the files were ordered in some way.

@jfmengels
Copy link

An additional constraint on multi-file analysis and multi-threading, that I hadn't remembered to bring up is automatic fixes. If you a module A that imports B, you may potentially need to fix B after having visited both (to remove an unused exported function for instance). At least the way that elm-review works*, that causes both B then A to be reanalyzed (before potentially re-applying different fixes). If B is read in other threads as well, then B's updated contents needs to be shared to the them, potentially discarding previous analysis done there. And those threads might in parallel have edited B as well, which then needs to be reconciled.

Maybe there's a good way to synchronize all of this, but something naive won't cut it I thinkg. And shared structs might be key here.

* elm-review is more cautious applying fixes than ESLint. We apply one fix then re-analyze what has changed and then apply another fix 🔁, we don't apply multiple fixes at once and hope for the best like ESLint does AFAIR.

For this proposal at least, if files are only read by a single thread, fixes should not be a concern 👍

(PS: Somewhat related, I just released multi-file fixes for elm-review on top of the existing multi-file analysis https://jfmengels.net/multi-file-fixes)

@nzakas
Copy link
Member

nzakas commented Feb 14, 2025

I think we should table any discussion of multi-file analysis for now. That's going to require significant changes to the core and we'll likely have to revisit a lot of how things work, including concurrent linting.

It's enough to just leave this as a note in the RFC.

@fasttime
Copy link
Member Author

I've added a new section "Warning Management" because ESLint v9.20.0 adds two new warning types (for inactive flags and for empty config files) that require some adaptation to work well in multithread mode.

@@ -508,6 +508,45 @@ This strategy comes with some unobvious implications that consumers should be aw
* Errors thrown by `lintFiles()` could be of a different type than the errors thrown when running in single-thread mode, and they may leak unserializable properties.
* Because of race conditions, subsequent calls to `lintFiles()` could produce different errors for identical input.

### Warning Management
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be helpful to extract warning management into a separate object rather than using process.emitWarning() directly. We could then have Linter use a default warning object that stubs out the relevant method(s) rather than emitting to console. We could then pass the "live" version to Linter from ESLint. Maybe this could simplify the thread behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, I've updated the RFC. The names for the new option and methods are just placeholders, so feel free to suggest better names.

@nzakas
Copy link
Member

nzakas commented Mar 18, 2025

@mdjermanovic can you take a look at this?

@faultyserver given your experience, would you mind reviewing this?

Copy link

@faultyserver faultyserver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm very intrigued by the approach in this proposal. I think in the right environment, shared memory and lightweight worker threads can possibly give a greater performance boost than the process-based approach i've used, but I also believe smart batching can be a helpful feature for supporting multifile analysis cases like typescript-eslint and streaming results. But processes also have their advantages for resource usage and requiring less configuration from the user (re UV_THREADPOOL_SIZE, again. A test on an 8-core machine with the 50% heuristic won't hit the contention of the default 4 pool size).

Some of the comments here are addressed later down in the proposal, but I'm leaving them for posterity as addressing thoughts I had reading the proposal top to bottom and learning as I went.


When `auto` concurrency is selected, ESLint will use a heuristic to determine the best concurrency setting, which could be any number of threads or also `"off"`.
How this heuristic will work is an open question.
An approach I have tested is using half the number of available CPU cores, which is a reasonable starting point for modern machines with many (4 or more) cores and fast I/O peripherals to access the file system.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing (using processes rather than threads) I've found it depends on whether the machine is hyperthreaded, the ratio of "p-cores" and "e-cores", and more, so I agree just calling out that setting the value can improve performance is the best approach rather than investing too much in a heuristic. Contextually, in another project I found that the best performance across all the machines I could try varied between 50% and 80% of the available cores, but that's also still dependent on file IO and the kernel's ability to read large files that quickly.

When `auto` concurrency is selected, ESLint will use a heuristic to determine the best concurrency setting, which could be any number of threads or also `"off"`.
An approach I have tested is using half the number of available CPU cores, which is a reasonable starting point for modern machines with many (4 or more) cores and fast I/O peripherals to access the file system.
But linting time also depends on the features of the project like the configuration or the number of files.
Some plugins like typescript-eslint with type-aware linting can increase the time required to initialize a linting thread resulting in additional per-thread overhead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This especially is a concern with startup for TypeScript. On an M2 Max, the fastest concurrency result for our 20k file project is actually only 4 processes, because the flood of file IO at the start just stalls the kernel out and it takes longer to finish that than the entire linting process does with fewer processes. There's very little actual CPU load that happens before lint rules start running, in my experience, its almost all just IO and memory allocation. On our 8-core CI machines, we also are only able to run 2 concurrent workers because of memory usage and the slower filesystem on those machines.

Also worth noting that the only reason this actually causes a problem for typescript-eslint is because typescript does file discovery first and reads all files into memory before starting any actual type work. The type linting itself is relatively fast in comparison.


The new multithread implementation will exist alongside the current, single-thread implementation, and it will only be used when multithreading mode is enabled.

In the multithread implementation `lintFiles()` launches as many worker threads as specified by the `concurrency` option, but no more than the number of enumerated files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a strong lower bound for when the cost of initializing a worker thread/process and communicating file paths becomes an actual improvement over just doing it all in one thread. In the original parallel implementation I wrote for Discord, this is addressed by working in batches and lazily creating workers as batches are sent for processing. In other implementations, I also found this lower bound to be around 50 files or so, where the number of rules being applied ended up being effectively irrelevant.

Even using shared memory and lightweight worker threads, I think particularly small projects will still possibly run faster on a single thread rather than paying that initialization and communication cost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something worth accounting for in the heuristic that calculates the number of threads with "auto" concurrency. Given $$n$$ files to be linted, the heuristic could ensure that default concurrency does not exceed $$\left\lceil n/K\right\rceil$$ threads, where $$K$$ would be hard-coded to a meaningful value, for example 50.

Comment on lines 266 to 267
The controlling thread itself does not lint any files: it waits until all files are linter or an error occurs.
When a worker thread terminates successfully it submits a list of `LintReport`s to the controlling thread. Each result is enriched with the index of the associated file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One benefit of batched work, and from running ESLint in general, is that lints effectively stream their output as they occur (unless i'm misremembering something, but that was one of the tenets i wanted to reproduce in my parallel implementations). Losing that streaming nature and only outputting results after everything has finished feels like a loss, even if it's a trivial one (i.e., the results will show, and they'll also come back faster than they did before).

This does introduce a higher IPC cost, though, and is really where the tradeoff of batching comes in. I think it'd be worth leaving this somewhat open-ended and experimenting with different approaches here to see what's performance and what feels best, especially on larger projects that are going to be taking a while to run and really benefitting from this parallelism in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of sending results when the worker thread completes, rather than as the results are produced, consists in the ability to post a single large message instead of multiple smaller ones. In the suggested implementation, the controlling thread does nothing with the results until all of them have been collected, hence submitting a part results to the main thread in advance would not introduce a benefit. That said, switching to streaming results as they are created should be an easy change, so we could test that during the implementation.

In the multithread implementation `lintFiles()` launches as many worker threads as specified by the `concurrency` option, but no more than the number of enumerated files.
All worker threads receive the same parameters from the controlling thread:
* the list of enumerated file paths
* a reference to shared memory containing the index of the next file to be processed in the list (a `TypedArray` with an underlying `SharedArrayBuffer`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a shared buffer here limits the options for parallelism to only using worker threads. In my experience, processes ended up being more efficient than threads. My current belief is that even though each thread has its own event loop, they all end up on the same libuv process that gets limited by the concurrency Node sets up for that internally (using UV_THREADPOOL_SIZE, which is only 4 by default. This article gives some good info on this). Using multiple processes frees up all of the concurrency and gives "more headroom" through a higher total working thread count by default.

But in the situation of a user just running new Eslint({concurrency: 'auto'}) on a 16-core machine without also setting UV_THREADPOOL_SIZE to at least 8 (more likely 11 to allow the main thread and IO events to be handled outside of the worker thread queues), they're going to get bottlenecked and have much slower results (in my experience, like 80-120% worse than using separate processes).

I think the way to address this is either a) use processes and send files in batches over IPC or b) indicate strongly to users that they will likely need to set the UV_THREADPOOL_SIZE variable to something greater than their requested concurrency to get the best results.

Option a feels better to me from a usability standpoint (i.e., don't tell the user to change their platform when a better internal solution that "just works" exists), but that's a tradeoff that I think will need addressing, and relying on a shared memory buffer limits the ability to change that in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current belief is that even though each thread has its own event loop, they all end up on the same libuv process that gets limited by the concurrency Node sets up for that internally (using UV_THREADPOOL_SIZE, which is only 4 by default. This article gives some good info on this). Using multiple processes frees up all of the concurrency and gives "more headroom" through a higher total working thread count by default.

This has been my experience as well. The default async worker thread of 4 greatly limits your available disk reads and bottlenecks things. In my own experience writing a parallelisation wrapper at my company I used node:cluster and it greatly improved perf over node:worker_thread. In my case we found about a 30% improvement by using cluster.

For reference Canva's codebase is linting ~90k files with ~120 rules (no type-aware linting). Though we initially were concerned about contention so we limited concurrency to a max of 8 workers (regardless of total CPU count).
Eventually we further improved that perf by spawning n-1 workers (on our CI machines that meant 31 instances of NodeJS) and that netted a further 50% improvement in lint time on our CI machines.

For us right now P70 lint time is ~5m. Not fast by any means, but it's definitely bearable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we manually set UV_THREADPOOL_SIZE to the number of logical cores?

const os = require('os');
process.env.UV_THREADPOOL_SIZE = os.cpus().length

In my experience, the fewer asks of the end user, the better life is for everyone.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ideal value is really hard to even guess at accurately, again just based on the performance vs efficiency core distribution, the ability of any hyperthreads, the other system IO bottlenecks, etc.

I agree it's nicest to not have to ask for the user to change things here, but at a minimum it definitely needs to be overridable, like using ??= to preserve whatever value might be set from the caller.

Even if there is a default applied internally, I think it's worth documenting that this env var exists and will likely improve performance by tuning it, along with some information about what parameters meaningfully affect the best value to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think we can probably make this a better user experience by creating a command-line argument to set it rather asking someone to set it manually (which would then make it difficult to set internally).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running my fork with different settings for UV_THREADPOOL_SIZE but didn't notice any difference – only the number of threads seems to impact performance. These are commands I typed in a zsh console of a MacBook. The last three of them run in around the same amount of time:

npm i -D eslint-p
time ./node_modules/.bin/eslint-p --concurrency=4
time UV_THREADPOOL_SIZE=1 ./node_modules/.bin/eslint-p --concurrency=4
time UV_THREADPOOL_SIZE=8 ./node_modules/.bin/eslint-p --concurrency=4

Is there a way to test the effect of the UV_THREADPOOL_SIZE in a comparable tool (one that uses multithreading to speed up a task)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running my fork with different settings for UV_THREADPOOL_SIZE but didn't notice any difference – only the number of threads seems to impact performance

If you're not doing enough FS ops to saturate the async threadpool then it wouldn't make much difference making it bigger.
Additionally depends on the size of your test codebase. For example if you've got like 1k files then it's likely you won't see much of a difference in total runtime because the total time reading from disk is small (relative to the entire lint run) which consequently means that the total potential blocking time would smaller still.

Anecdotally I personally haven't hit these sorts of threadpool perf problems in any meaningful way until I was dealing with 10's of thousands of files -- at scale it makes a noticeable difference. At scale you are saturating the thread pool more and for longer which means more chance of blocking time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bradzacher! Do you happen to know a GitHub repo with tens of thousands of files that uses ESLint 9? It would be great to have some reference projects everyone can use to run tests and compare the results.

Copy link

@faultyserver faultyserver Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up rewriting our internal parallelizer to use the same approach from this proposal (a shared index pointer and worker threads using that index) to test and got the same results as you, even on our large project (~20k files): changing UV_THREADPOOL_SIZE above 4 had no effect, but changing it down to 2 did slow things down noticeably. So I'm no longer sure this is a relevant suggestion, and even through investigating I couldn't figure out what was actually changing when I changed the value, other than just spawning more threads that possibly didn't do anything.

Interestingly, though, I also ran this with just typescript-eslint rules enabled on our project, and it ended up using a lot more memory, but i think that's because the lack of directory-based file batching logic meant each thread effecitvely had to check the entire project rather than just a subset of it. It also ran ~30% slower, but i think at least part of that can be attributed to the less efficient type caching.

UV_THREADPOOL_SIZE also had minimal effect here (on my 12-core M2 Max), so I don't believe that's an issue anymore.


The controlling thread itself does not lint any files: it waits until all files are linter or an error occurs.
When a worker thread terminates successfully it submits a list of `LintReport`s to the controlling thread. Each result is enriched with the index of the associated file.
The controlling thread collects the lists of `LintReport`s and merges them in the expected order, then populates `lintResultCache` with cache information (**note**: this is different from the single-thread implementation where the cache is populated progressively as files are being linted).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merges them in the expected order

I presume this is part of the reason for waiting and batching results rather than streaming them. Is "expected order" something that ESLint guarantees? Or requires preserving in a meaningful way? I could imagine if the user passes in a concrete list of files that possibly they'd want to see the results in exactly that order, but personally I also think this does not matter, and I don't believe any of the existing parallelism implementations attempt to preserve this behavior either (i.e., a result contains the file name and location, so it doesn't matter if its index matches the original file list order).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "expected order" something that ESLint guarantees? Or requires preserving in a meaningful way? I could imagine if the user passes in a concrete list of files that possibly they'd want to see the results in exactly that order

The order of lint results depends on how files are enumerated before linting starts. This does not necessarily reflect the order specified by a user, but it's stable in the sense that it doesn't change across executions. For multithread mode, the order of lint results will be the same as for single-thread mode.

Comment on lines +288 to +297
const abortController = new AbortController();
const fileIndexCounter = new Int32Array(new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT));
const workerPath = require.resolve("./worker.js");
const workerOptions = {
workerData: {
filePaths,
fileIndexCounter,
eslintOptions
},
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When they are all linted, it sends a message asking for more. I know this creates more chatter, but I'm wondering if it might end up being more memory-efficient in the long-run?

This is how I've implemented parallelism, at least, and I took inspiration for that from other implementations I saw as well. I mentioned this in my other comments in this section, but batching has other benefits as well for re-enabling result streaming, allowing for lazy initialization of workers based on the project file count to make sure it actually runs more efficiently than just a single thread would.

Comment on lines +355 to +356
For the purpose of multithread linting, all linting threads are created by the controlling thread at the same time before the linting process starts.
Although the controlling thread will be typically the main thread of the process, this is not a general requirement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned above, but noting here as well: I believe the real worker count should be informed by the size of the enumerated file list. worker threads are much lighter than full processes, but I believe they still have some cost that means default behavior on small projects will be slower than optimal. This could be tested once an implementation exists to know for sure, though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed yes, there is no point in having more worker threads than there are files to be linted. In "auto" concurreny mode, the number of worker threads could be limited to one worker thread each 50 files, for example. I would like to come up with a proto-implementation to determine the number of effective threads, even if the logic could change anytime.

Especially plugins that require a longer initialization time, such as typescript-eslint, will be affected by this problem.
While several solutions have been proposed to mitigate this issue, discussing them would exceed the scope of this proposal.
To avoid establishing false expectations, it would make sense to include a note regarding the potential performance degradation in the documentation or perhaps to issue a runtime warning.
The recommendation will be to enable multithread linting only when it performs measurably faster than linting in single-thread mode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting again from my comments above, a critical piece of this for users on large machines with large projects and using worker_threads may be ensuring that UV_THREADPOOL_SIZE is set high enough to avoid contention.

**https://github.com/discord/eslint/tree/parallel by [faultyserver](https://github.com/faultyserver)**

This is a fork of ESLint v8 with parallel linting support.[^4]
It launches multiple sub-processes where each sub-process lints a precomputed subset of files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not included in the public fork we (Discord) have available is improved batching:

Each subprocess lints a batch of files, where the controlling process manages creating "effective" batches and adding each batch to a queue that is sent to workers as they become available (i.e., a naive form of backpressure handling). Specifically, it batches using a heuristic that includes total batch size (keep adding files until at least 50 are present) and directory structure (add all files that are direct children of a directory to a batch at once, even if it goes well over 50 files).

This yielded massive performance improvements for us, especially with typescript-eslint, since it benefits greatly from its internal caching when working with the same types over and over again, and that is highly correlated with the directory that a file lives in (for the most part, and for our projects specifically, at least).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants