Skip to content

Improve type inference for member access in unions and intersections #461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

taminomara
Copy link
Contributor

I'm still not very familiar with code base, so comments are welcome.

Fix #454.

This behavior is consistent with modern type theory. In mathematics, 'type' is defined as a set of all possible values that a variable can have. Therefore, a union of types would include all possible values for left and right types. An intersection of types would only include values that can be assigned to both left and right types. Hence, we come to rules for member access:

  • for union types, we can access a member if both left and right types have this member. Type of the member would be a union of corresponding members from left and right types;
  • for intersection types, we can access a member if either left or right type contain it. Type of the member would be an intersection of corresponding members from left and right types.

Note for unions. If one of the types does not contain a member, we shouldn't allow accessing it:

local x --- @type integer | { meep: string }
x.meep --> Warning: accessing field on an integer value can cause an error.

Note for intersections. If one of the types does not contain a member, we won't be able to construct such type, therefore member access is safe:

local x --- @type integer & { meep: string }
x.meep --> No warning; however, we will not be able to assign anything to `x`.

@taminomara
Copy link
Contributor Author

Things to consider:

This change makes access to union members stricter. This is good for cases when union contains types that aren't safe to index (integer | { meep: string } from above). However, this can be inconvenient when dealing with unions of tables:

local x --- @type { a: integer } | { b: integer }
if x.a ~= nil then --> Warning: no field named `a`.
    -- ...
end

To make this case better, we can save information about reason for inference failure in InferFailReason::FieldDotFound. If accessing a not found member would lead to an error, we will propagate an error. If it would yield nil, we would return a nullable value.

P.S. should we rename InferFailReason::FieldDotFound to InferFailReason::FieldNotFound?

@CppCXY
Copy link
Member

CppCXY commented May 16, 2025

image

ts tell me, if the field is same, if type is different, it will become never

@CppCXY
Copy link
Member

CppCXY commented May 16, 2025

image

and the union

@taminomara
Copy link
Contributor Author

This is correct. Except, since we don't have never, I simply return intersection unchanged.

image

image

@taminomara
Copy link
Contributor Author

string & number behaves as never here.

@CppCXY
Copy link
Member

CppCXY commented May 16, 2025

The current system is not prepared for the occurrence of "never." As I explained before, I cannot determine the specific types of A and B during infer_type, so I cannot reasonably simplify type inference.

@taminomara
Copy link
Contributor Author

I'm not sure I understand where the problem is. I mean, LuaType is specific enough. During inference, we just collect A and B into an array; by the time we run diagnostics, we should have enough info to resolve those arrays.

Sure, if a user spreads declaration of a class over multiple files, they might have an issue. But I don't think it's a common thing to do.

@taminomara
Copy link
Contributor Author

Maybe you can give an example, when thing will go wrong?

@CppCXY
Copy link
Member

CppCXY commented May 16, 2025

When diagnosing, all information is indeed determined, but during inference, everything is uncertain. For example, when dealing with A & B, a field 'x' of A might be defined in a place that I haven't analyzed yet or cannot analyze, so I don't consider 'x' to be a field of A. However, at this point, B may have already been analyzed and 'x' has been identified. Now, when we try to resolve (A & B).x, it becomes never. However, once the field 'x' of A is later inferred, (A & B).x is no longer never.

@taminomara
Copy link
Contributor Author

But what happens if we try to resolve A.x? Doesn't it become unknown (or any, depending on A)?

@CppCXY
Copy link
Member

CppCXY commented May 16, 2025

But what happens if we try to resolve A.x? Doesn't it become unknown (or any, depending on A)?

No matter whether we assume A.x exists or not, any subsequent inference that depends on it will be incorrect. This issue does not exist in TypeScript, because in TypeScript, type definitions are not split across files—all types and their members are already determined, as it can rely on imports to establish a dependency graph and completely trust this dependency relationship. This is not possible in Lua.

@CppCXY
Copy link
Member

CppCXY commented May 16, 2025

I think the current PR can be kept for now until I make a reasonable refactor and assumptions someday. At the moment, I still have many stack overflow and performance issues to resolve.

@taminomara
Copy link
Contributor Author

My point is, if we calculate type of a field during inference stage, and we don't have all information about it, we will get an incorrect result. Doesn't matter if this field is in a union or in a simple table. So I don't see the reason we can't merge this PR now, and fix this issue later.

Besides, current implementation for union works like intersection, so maybe it's better to fix it before too much users depend on it?

@taminomara
Copy link
Contributor Author

Here's some preliminary thoughts on this issue.

  1. This bug can only happen during the initial load. If a file appears or changes after it, the index will be re-calculated, so all previously wrongly unresolved fields will be updated.

  2. There's an easy solution. Instead of registering all files into the database, and then running analysis, we can add files one-by-one. This way, new files will clear the cache for previously loaded files, if there is a dependency. We trade time to initially load the project for a simple solution. In the worst case, we will run analysis O(n^2) times. I honestly don't know if it's acceptable or not.

  3. There's a more complex solution. For initial load, instead of iterating over files and running all stages of analysis at once, we can do the following. First, we resolve declarations, requires and global variables for all files. Second, we sort files topologically. Third, we perform the rest of the analysis.

@xuhuanzy
Copy link
Member

xuhuanzy commented May 16, 2025

  1. There's an easy solution. Instead of registering all files into the database, and then running analysis, we can add files one-by-one. This way, new files will clear the cache for previously loaded files, if there is a dependency. We trade time to initially load the project for a simple solution. In the worst case, we will run analysis O(n^2) times. I honestly don't know if it's acceptable or not.

This is unacceptable. There are many large-scale projects using EmmyLua, often with more than 5,000+ files, and I've even seen single alias exceeding 200KB.

You can try refactoring the analysis algorithm; cppcxy doesn't have enough time to make the changes, so don't count on it happening this year at least.

@taminomara
Copy link
Contributor Author

@xuhuanzy do you have a link to such examples? If I'll take on any refactorings, I'd like to see if my changes slow down or break anything.

@CppCXY
Copy link
Member

CppCXY commented May 17, 2025

@xuhuanzy do you have a link to such examples? If I'll take on any refactorings, I'd like to see if my changes slow down or break anything.

Most large projects like this are usually internal code within various companies, so they cannot be shared. This is also why it is difficult for us to solve these problems.

@CppCXY
Copy link
Member

CppCXY commented May 17, 2025

My point is, if we calculate type of a field during inference stage, and we don't have all information about it, we will get an incorrect result. Doesn't matter if this field is in a union or in a simple table. So I don't see the reason we can't merge this PR now, and fix this issue later.

Besides, current implementation for union works like intersection, so maybe it's better to fix it before too much users depend on it?

I am working on EmmyLuaLs/emmylua_dap for zed , intellij and vscode, I need to consider these PRs after that.

@taminomara
Copy link
Contributor Author

taminomara commented May 17, 2025

  1. This bug can only happen during the initial load. If a file appears or changes after it, the index will be re-calculated, so all previously wrongly unresolved fields will be updated.

Turns out I was wrong, dependent files do not get outdated. In this case, yes, I agree that a refactoring is required to fix this. Nope, there's re-indexing logic.

@taminomara
Copy link
Contributor Author

No matter whether we assume A.x exists or not, any subsequent inference that depends on it will be incorrect. This issue does not exist in TypeScript, because in TypeScript, type definitions are not split across files—all types and their members are already determined, as it can rely on imports to establish a dependency graph and completely trust this dependency relationship. This is not possible in Lua.

Alright, I've tried to come up with a test that would fail for such case, and I couldn't. The pass architecture from my third suggestion is exactly how things are implemented now. All in all, I don't see why my PR can't be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Semantics of unions and intersections of tables
3 participants