Skip to content

test-perf-hooks failing on v18.x SmartOS CI #57000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aduh95 opened this issue Feb 10, 2025 · 11 comments
Closed

test-perf-hooks failing on v18.x SmartOS CI #57000

aduh95 opened this issue Feb 10, 2025 · 11 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. smartos Issues and PRs related to the SmartOS platform. v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch.

Comments

@aduh95
Copy link
Contributor

aduh95 commented Feb 10, 2025

Test

sequential/test-perf-hooks

Platform

SmartOS

Console output

not ok 3959 sequential/test-perf-hooks
  ---
  duration_ms: 352.25600
  severity: fail
  exitcode: 1
  stack: |-
    node:assert:399
        throw err;
        ^
    
    AssertionError [ERR_ASSERTION]: Date.now() - performance.timeOrigin (169.60791015625) - process.uptime() * 1000 (94.956) = 74.65191015625 >= +- 50
        at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos23-x64/test/sequential/test-perf-hooks.js:35:3)
        at Module._compile (node:internal/modules/cjs/loader:1364:14)
        at Module._extensions..js (node:internal/modules/cjs/loader:1422:10)
        at Module.load (node:internal/modules/cjs/loader:1203:32)
        at Module._load (node:internal/modules/cjs/loader:1019:12)
        at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:128:12)
        at node:internal/main/run_main_module:28:49 {
      generatedMessage: false,
      code: 'ERR_ASSERTION',
      actual: false,
      expected: true,
      operator: '=='
    }
    
    Node.js v18.20.7-pre

Build links

Additional information

No response

@aduh95 aduh95 added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Feb 10, 2025
@github-actions github-actions bot added the smartos Issues and PRs related to the SmartOS platform. label Feb 10, 2025
@aduh95
Copy link
Contributor Author

aduh95 commented Feb 10, 2025

/cc @nodejs/platform-smartos

@aduh95 aduh95 changed the title SmartOS test-perf-hooks failing on v18.x test-perf-hooks failing on v18.x SmartOS CI Feb 11, 2025
aduh95 added a commit to aduh95/node that referenced this issue Feb 11, 2025
test: skip `test-perf-hooks` on SmartOS

Refs: nodejs#57000
aduh95 added a commit to aduh95/node that referenced this issue Feb 12, 2025
Refs: nodejs#57000
PR-URL: nodejs#57001
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
@aduh95 aduh95 added the v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch. label Feb 13, 2025
@danmcd
Copy link
Contributor

danmcd commented Feb 26, 2025

Please add me ( @danmcd ) to the platform-smartos team in the short term. ANd in the longer term we need to make this "ILLUMOS" because it affects all illumos distros, not just SmartOS.

I know we provide resources, but with the recent departure of @bahamat I want to make sure our end is informed and active.

@jclulow
Copy link

jclulow commented Feb 26, 2025

NB: I've created nodejs/admin#954 about adding you, @danmcd.

@danmcd
Copy link
Contributor

danmcd commented Feb 26, 2025

BTW, due to strange-to-me OAUTH requirements of the ci.nodejs.org, I cannot yet see the CI links posted above. Asked for permissions too from orgs I belong to, but am not an admin in. That's odd, but honestly the orgs in question are illumos-related, so it's good you know that in the long run this is about illumos, not just SmartOS.

@danmcd
Copy link
Contributor

danmcd commented Feb 26, 2025

BTW, due to strange-to-me OAUTH requirements of the ci.nodejs.org, I cannot yet see the CI links posted above. Asked for permissions too from orgs I belong to, but am not an admin in. That's odd, but honestly the orgs in question are illumos-related, so it's good you know that in the long run this is about illumos, not just SmartOS.

Yeah, one org I belong to declined you because it wanted private repos. COuld be a rate-limit thing so I'll try again tomorrow.

@danmcd
Copy link
Contributor

danmcd commented Feb 27, 2025

I STILL can't access the CI @aduh95 because one of my orgs won't grant permission (private repos). Would appreciate assistance in being able to see the three failures so I can help.

@aduh95
Copy link
Contributor Author

aduh95 commented Feb 27, 2025

You can find the output in #57000 (comment); other than that, you should be able to access the CI as long as you have a GitHub account, you shouldn't need to get permissions (at least, not from our side).
I would recommend getting access to the CI, and watch failures in daily builds, that's the best place to spot flaky tests.

@danmcd
Copy link
Contributor

danmcd commented Feb 27, 2025

as long as you have a GitHub account, you shouldn't need to get permissions (at least, not from our side).

Yeah, I thought that was all I needed too, but OAuth is whining at me a lot. First about needing org access, then rate-limit blocking me after too many tries. :(

@danmcd
Copy link
Contributor

danmcd commented Feb 27, 2025

WHOA NEVER MIND! Somehow it cleared up!

@danmcd
Copy link
Contributor

danmcd commented Feb 27, 2025

So the margin-of-error on this test is +/- 50 (units unclear), and for some reason we seem to be exceeding that (53, 62, 82 (units unclear)).

I do not know the machines/VMs we provide for this, but if they are VMs on loaded compute nodes, such jitter could be introduced by overloading. I see one other set of tests was marked as possible-flaky:

#56583

and the short-term fix would be to add this for test-perf-hooks as well. I know the illumos (and Solaris) gethrtime() should provided the nsec-granularity time a higher-level entity would need. This expression:

Date.now() - performance.timeOrigin (129.402099609375) - process.uptime() * 1000 (76.324195) = 53.077904609375 >= +- 50

Makes me wonder which terms get their respective timestamp data from?

  • Date.now()
  • performance.timeOrigin()
  • process.uptime()

I'm pretty sure I know where to get sources for all of those (and would Date.now() might be subject to NTP-driven drift?), but I wonder if on a loaded-compute-node VM would any these get enough jitter to exceed your margin of error?

@targos
Copy link
Member

targos commented May 1, 2025

v18.x is now EoL.

@targos targos closed this as not planned Won't fix, can't repro, duplicate, stale May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. smartos Issues and PRs related to the SmartOS platform. v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch.
Projects
None yet
Development

No branches or pull requests

4 participants