Skip to content

[Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends #17985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vacu9708
Copy link
Contributor

@vacu9708 vacu9708 commented May 18, 2025

Summary

This PR resolves issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends.

The problem lies with inverse trigonometric functions and MaxPool ops in this issue.

Changes

Two root causes were identified and addressed:

  1. Missing asin domain check
    The Taylor-series approximation of asin did not validate its input domain, allowing values outside [-1, 1] to silently produce numeric results.
    • Fix: Updated tir.asin to return quiet-NaN if the input is outside of [-1, 1].
  2. LLVM backend and CUDA backend follow different NaN policies for float max/min
    • LLVM: propagates NaNs (if any operand is NaN, the result is NaN)
    • CUDA: treats NaN as missing data as described in the CUDA API manual:
      • If one operand is NaN and the other is a number, choose the numeric value.
      • If both are NaN, return NaN.
    • Fix: Updated max/min ops on LLVM to match CUDA’s behavior.

Notes

  • I also tried to align the CUDA behavior with LLVM but CUDA does not appear to support the propagating-NaN behavior.
    • Regarding this, I'd appreciate maintainers' feedback on whether to keep these different NaN policies.
  • The lint check seems to have a bug. It flags the following as correct:
PrimExpr out_range = Or(x<lower, x> upper);

However, the correct forms would be either:

PrimExpr out_range = Or(x < lower, x > upper);

or

PrimExpr out_range = Or(x<lower, x>upper);

@vacu9708 vacu9708 force-pushed the asin_domain/pool_nan branch 20 times, most recently from 3bb7ef7 to 1856d9e Compare May 19, 2025 12:18
@vacu9708 vacu9708 changed the title [Codegen] Add asin domain check and align NaN‐handling in pooling with CUDA semantics [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends. May 20, 2025
@tqchen
Copy link
Member

tqchen commented May 23, 2025

For this particular case, it should be the issue of the input itself? I think it is fine for different platforms to have different Nan policies as long as they are efficient, given efficiency outweights other parts. The main thing we want to ensure is that the correct domain behavior is reasonable.

@vacu9708 vacu9708 force-pushed the asin_domain/pool_nan branch from 1856d9e to 39bf52c Compare May 24, 2025 13:26
@vacu9708
Copy link
Contributor Author

@tqchen Thanks for your feedback.
I removed the changes for the different NaN policies on both backends.
I maintained the changes for the domain limit check.

@vacu9708 vacu9708 force-pushed the asin_domain/pool_nan branch from 39bf52c to 6513b9c Compare May 24, 2025 13:42
- Update `tir.asin` to return quiet NaN if the input is outside of [-1, 1]
@vacu9708 vacu9708 force-pushed the asin_domain/pool_nan branch from 6513b9c to be88aea Compare May 24, 2025 13:49
@vacu9708 vacu9708 changed the title [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends. [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends May 24, 2025
@vacu9708
Copy link
Contributor Author

@tvm-bot rerun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants