Improve type inference for member access in unions and intersections #461

taminomara · 2025-05-16T04:42:24Z

I'm still not very familiar with code base, so comments are welcome.

This behavior is consistent with modern type theory. In mathematics, 'type' is defined as a set of all possible values that a variable can have. Therefore, a union of types would include all possible values for left and right types. An intersection of types would only include values that can be assigned to both left and right types. Hence, we come to rules for member access:

for union types, we can access a member if both left and right types have this member. Type of the member would be a union of corresponding members from left and right types;
for intersection types, we can access a member if either left or right type contain it. Type of the member would be an intersection of corresponding members from left and right types.

Note for unions. If one of the types does not contain a member, we shouldn't allow accessing it:

local x --- @type integer | { meep: string }
x.meep --> Warning: accessing field on an integer value can cause an error.

Note for intersections. If one of the types does not contain a member, we won't be able to construct such type, therefore member access is safe:

local x --- @type integer & { meep: string }
x.meep --> No warning; however, we will not be able to assign anything to `x`.

taminomara · 2025-05-16T04:42:52Z

Things to consider:

This change makes access to union members stricter. This is good for cases when union contains types that aren't safe to index (integer | { meep: string } from above). However, this can be inconvenient when dealing with unions of tables:

local x --- @type { a: integer } | { b: integer }
if x.a ~= nil then --> Warning: no field named `a`.
    -- ...
end

To make this case better, we can save information about reason for inference failure in InferFailReason::FieldDotFound. If accessing a not found member would lead to an error, we will propagate an error. If it would yield nil, we would return a nullable value.

P.S. should we rename InferFailReason::FieldDotFound to InferFailReason::FieldNotFound?

CppCXY · 2025-05-16T13:51:00Z

ts tell me, if the field is same, if type is different, it will become never

CppCXY · 2025-05-16T13:52:34Z

and the union

taminomara · 2025-05-16T14:01:09Z

This is correct. Except, since we don't have never, I simply return intersection unchanged.

taminomara · 2025-05-16T14:02:14Z

string & number behaves as never here.

CppCXY · 2025-05-16T14:07:33Z

The current system is not prepared for the occurrence of "never." As I explained before, I cannot determine the specific types of A and B during infer_type, so I cannot reasonably simplify type inference.

taminomara · 2025-05-16T14:19:04Z

I'm not sure I understand where the problem is. I mean, LuaType is specific enough. During inference, we just collect A and B into an array; by the time we run diagnostics, we should have enough info to resolve those arrays.

Sure, if a user spreads declaration of a class over multiple files, they might have an issue. But I don't think it's a common thing to do.

taminomara · 2025-05-16T14:19:51Z

Maybe you can give an example, when thing will go wrong?

CppCXY · 2025-05-16T14:30:00Z

When diagnosing, all information is indeed determined, but during inference, everything is uncertain. For example, when dealing with A & B, a field 'x' of A might be defined in a place that I haven't analyzed yet or cannot analyze, so I don't consider 'x' to be a field of A. However, at this point, B may have already been analyzed and 'x' has been identified. Now, when we try to resolve (A & B).x, it becomes never. However, once the field 'x' of A is later inferred, (A & B).x is no longer never.

taminomara · 2025-05-16T14:33:38Z

But what happens if we try to resolve A.x? Doesn't it become unknown (or any, depending on A)?

CppCXY · 2025-05-16T14:38:54Z

But what happens if we try to resolve A.x? Doesn't it become unknown (or any, depending on A)?

No matter whether we assume A.x exists or not, any subsequent inference that depends on it will be incorrect. This issue does not exist in TypeScript, because in TypeScript, type definitions are not split across files—all types and their members are already determined, as it can rely on imports to establish a dependency graph and completely trust this dependency relationship. This is not possible in Lua.

CppCXY · 2025-05-16T14:55:58Z

I think the current PR can be kept for now until I make a reasonable refactor and assumptions someday. At the moment, I still have many stack overflow and performance issues to resolve.

taminomara · 2025-05-16T15:21:32Z

My point is, if we calculate type of a field during inference stage, and we don't have all information about it, we will get an incorrect result. Doesn't matter if this field is in a union or in a simple table. So I don't see the reason we can't merge this PR now, and fix this issue later.

Besides, current implementation for union works like intersection, so maybe it's better to fix it before too much users depend on it?

taminomara · 2025-05-16T15:31:59Z

Here's some preliminary thoughts on this issue.

This bug can only happen during the initial load. If a file appears or changes after it, the index will be re-calculated, so all previously wrongly unresolved fields will be updated.
There's an easy solution. Instead of registering all files into the database, and then running analysis, we can add files one-by-one. This way, new files will clear the cache for previously loaded files, if there is a dependency. We trade time to initially load the project for a simple solution. In the worst case, we will run analysis O(n^2) times. I honestly don't know if it's acceptable or not.
There's a more complex solution. For initial load, instead of iterating over files and running all stages of analysis at once, we can do the following. First, we resolve declarations, requires and global variables for all files. Second, we sort files topologically. Third, we perform the rest of the analysis.

xuhuanzy · 2025-05-16T19:29:01Z

There's an easy solution. Instead of registering all files into the database, and then running analysis, we can add files one-by-one. This way, new files will clear the cache for previously loaded files, if there is a dependency. We trade time to initially load the project for a simple solution. In the worst case, we will run analysis O(n^2) times. I honestly don't know if it's acceptable or not.

This is unacceptable. There are many large-scale projects using EmmyLua, often with more than 5,000+ files, and I've even seen single alias exceeding 200KB.

You can try refactoring the analysis algorithm; cppcxy doesn't have enough time to make the changes, so don't count on it happening this year at least.

taminomara · 2025-05-17T09:23:20Z

@xuhuanzy do you have a link to such examples? If I'll take on any refactorings, I'd like to see if my changes slow down or break anything.

CppCXY · 2025-05-17T09:41:44Z

@xuhuanzy do you have a link to such examples? If I'll take on any refactorings, I'd like to see if my changes slow down or break anything.

Most large projects like this are usually internal code within various companies, so they cannot be shared. This is also why it is difficult for us to solve these problems.

CppCXY · 2025-05-17T09:46:15Z

My point is, if we calculate type of a field during inference stage, and we don't have all information about it, we will get an incorrect result. Doesn't matter if this field is in a union or in a simple table. So I don't see the reason we can't merge this PR now, and fix this issue later.

Besides, current implementation for union works like intersection, so maybe it's better to fix it before too much users depend on it?

I am working on EmmyLuaLs/emmylua_dap for zed , intellij and vscode, I need to consider these PRs after that.

taminomara · 2025-05-17T11:44:26Z

This bug can only happen during the initial load. If a file appears or changes after it, the index will be re-calculated, so all previously wrongly unresolved fields will be updated.

~~Turns out I was wrong, dependent files do not get outdated. In this case, yes, I agree that a refactoring is required to fix this.~~ Nope, there's re-indexing logic.

taminomara · 2025-05-17T21:43:28Z

No matter whether we assume A.x exists or not, any subsequent inference that depends on it will be incorrect. This issue does not exist in TypeScript, because in TypeScript, type definitions are not split across files—all types and their members are already determined, as it can rely on imports to establish a dependency graph and completely trust this dependency relationship. This is not possible in Lua.

Alright, I've tried to come up with a test that would fail for such case, and I couldn't. The pass architecture from my third suggestion is exactly how things are implemented now. All in all, I don't see why my PR can't be merged.

Improve type inference for unions and intersections

1cff744

taminomara added 2 commits May 16, 2025 14:22

Fix test

cd1fc0a

Fix spellchecker

15f5e93

Improve type inference for member access in unions and intersections #461

Are you sure you want to change the base?

Improve type inference for member access in unions and intersections #461

Conversation

taminomara commented May 16, 2025

Uh oh!

taminomara commented May 16, 2025

Uh oh!

CppCXY commented May 16, 2025

Uh oh!

CppCXY commented May 16, 2025

Uh oh!

taminomara commented May 16, 2025

Uh oh!

taminomara commented May 16, 2025

Uh oh!

CppCXY commented May 16, 2025

Uh oh!

taminomara commented May 16, 2025

Uh oh!

taminomara commented May 16, 2025

Uh oh!

CppCXY commented May 16, 2025

Uh oh!

taminomara commented May 16, 2025

Uh oh!

CppCXY commented May 16, 2025

Uh oh!

CppCXY commented May 16, 2025

Uh oh!

taminomara commented May 16, 2025

Uh oh!

taminomara commented May 16, 2025

Uh oh!

xuhuanzy commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taminomara commented May 17, 2025

Uh oh!

CppCXY commented May 17, 2025

Uh oh!

CppCXY commented May 17, 2025

Uh oh!

taminomara commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taminomara commented May 17, 2025

Uh oh!

Uh oh!

xuhuanzy commented May 16, 2025 •

edited

Loading

taminomara commented May 17, 2025 •

edited

Loading