[WIP] Avoid R API calls within OpenMP region #11448

hcho3 · 2025-05-10T00:32:58Z

#11431 (comment)

This reverts commit aba9812.

david-cortes · 2025-05-10T06:02:24Z

Would it perhaps be better to backport the PR that fixed the same issue in the master branch?
#9823

hcho3 · 2025-05-10T06:05:05Z

Is that PR after the massive rewrite of the R package? If so, it probably won't apply to the 1.7 branch

david-cortes · 2025-05-10T06:06:23Z

Is that PR after the massive rewrite of the R package? If so, it probably won't apply to the 1.7 branch

I think it was before the changes to .R files, but I don't remember. The thing is that it also fixed other issues along the way.

david-cortes · 2025-05-10T06:11:09Z

Actually looking at the PR: might not be easy to backport, let's better stick with this one.

david-cortes · 2025-05-10T06:12:18Z

jvm-packages/xgboost4j/pom.xml

@@ -6,10 +6,10 @@
    <parent>
        <groupId>ml.dmlc</groupId>
        <artifactId>xgboost-jvm_2.12</artifactId>
-        <version>1.7.11</version>
+        <version>1.7.12</version>


Since this is R-only, I guess changes here are not needed.

david-cortes · 2025-05-10T06:13:15Z

R-package/tests/testthat/test_helpers.R

+  expect_equal(importance_from_dump(), importance, tolerance = 1e-6)
+
+  ## decision stump
+  m <- xgboost::xgboost(


I'm not sure if it'd be correct to use xgboost:: in tests. It's certainly not needed as the whole namespace is imported.

david-cortes · 2025-05-10T06:14:09Z

R-package/tests/testthat/test_feature_weights.R

+
+n_threads <- 2
+
+test_that("training with feature weights works", {


I'd say better not to add extra tests for this short-lived release, since they increase the chances of rejections and extra fixes being needed.

This is merely restoring the tests removed in the previous commit, per the CRAN maintainer's request

aitap · 2025-05-10T07:07:03Z

R-package/src/xgboost_R.cc

@@ -247,15 +247,18 @@ XGB_DLL SEXP XGDMatrixSetInfo_R(SEXP handle, SEXP field, SEXP array) {
  auto ctx = DMatrixCtx(R_ExternalPtrAddr(handle));
  if (!strcmp("group", name)) {
    std::vector<unsigned> vec(len);
+    const int* array_ptr = INTEGER_RO(array);


Didn't notice that you have Depends: R (>= 3.3.0). INTEGER_RO and REAL_RO appeared in 3.5.0. You can either redefine them on older versions of R or use INTEGER and REAL.

trivialfis · 2025-05-10T15:35:55Z

Thank you for working on this! Please let me know if there's anything I can help. (testing, submitting packages, etc).

hcho3 · 2025-05-10T23:59:04Z

@trivialfis Can you help with submitting the package with the fix?

trivialfis · 2025-05-11T08:08:27Z

Can you help with submitting the package with the fix?

Is this ready? (still marked as draft)

Since 1.7.11 never went through, do you think we need to bump to 1.7.12?

Lastly, did you manage to tackle issues with the changed openblas? There seems to be floating point accuracy related test failures.

hcho3 · 2025-05-11T10:26:27Z

Lastly, did you manage to tackle issues with the changed openblas? There seems to be floating point accuracy related test failures.

No, I was never able to reproduce the issue.

david-cortes · 2025-05-11T11:23:18Z

Per CRAN recommendations:
https://cran.r-project.org/web/packages/policies.html

Increasing the version number at each submission reduces confusion so is preferred even when a previous submission was not accepted.

There seems to be floating point accuracy related test failures.

I don't think that's the case. The failures looked like completely different results than would be expected, not small numerical inaccuracies.

No, I was never able to reproduce the issue.

Perhaps one way could be by building R with flag --enable-strict-barrier, like in the email:
https://stat.ethz.ch/pipermail/r-package-devel/2025q2/011652.html

hcho3 · 2025-05-11T21:17:33Z

Perhaps one way could be by building R with flag --enable-strict-barrier, like in the email:

I already tried it and could not reproduce the other test failures.

@trivialfis Have the maintainer gotten back to you about reproducing the OpenBLAS environment? If not, let's ask for permission to remove the failing tests

trivialfis · 2025-05-12T10:59:22Z

Have the maintainer gotten back to you about reproducing the OpenBLAS environment? If not, let's ask for permission to remove the failing tests

Need to separate the issues here. I'm not sure whether the issue fixed by this PR is the cause of all test failures.

How many test failures did you reproduce using the --enable-strict-barrier flag? We have multiple test failures. Did you manage to reproduce all of them using this flag, some of them, or none of them? Could you please share the check.log for enable-strict-barrier?

Are you using docker to build the R variant? If so, I can run tests as well.

hcho3 · 2025-05-12T11:04:33Z

How many test failures did you reproduce using the --enable-strict-barrier flag?

Only the tests that were failing due to memory protection error, e.g. test_feature_weights.R. These tests crash immediately when run with --enable-strict-barrier.

I will get back to you with the full list of tests that are affected by the memory issue.

david-cortes · 2025-05-12T17:24:39Z

Have the maintainer gotten back to you about reproducing the OpenBLAS environment? If not, let's ask for permission to remove the failing tests

Need to separate the issues here. I'm not sure whether the issue fixed by this PR is the cause of all test failures.

How many test failures did you reproduce using the --enable-strict-barrier flag? We have multiple test failures. Did you manage to reproduce all of them using this flag, some of them, or none of them? Could you please share the check.log for enable-strict-barrier?

Are you using docker to build the R variant? If so, I can run tests as well.

In the case of memory corruptions, since all the tests are executed in the same process, it might not necessarily be the case that problems come from the specific test that fails, nor that failures would be deterministically reproducible.

Have the maintainer gotten back to you about reproducing the OpenBLAS environment?

They seldomly answer emails so I wouldn't count on getting a reply in due time.

hcho3 · 2025-05-12T19:32:31Z

@trivialfis Only the test in test_feature_weights.R is affected by the memory protection bug. Other tests run fine with --enable-strict-barrier flag enabled, but test_feature_weights.R crashes with segfault:

...
Fatal error: Wrong thread calling 'RunFinalizers'
Fatal error: Wrong thread calling 'RunFinalizers'
Fatal error: Wrong thread calling 'RunFinalizers'
Segmentation fault (core dumped)

And of course, I am not able to reproduce the other failing tests, e.g.

Failure ('test_interaction_constraints.R:36:3'): interaction constraints for regression
  `actual`:   FALSE
  `expected`: TRUE 
  Monotone constraint satisfied

I'd suggest the following course of action:

Make a new submission with the memory fix (this PR) and see if it also fixes other tests.
If the other tests still fail in the CRAN server, we should remove the failing tests and re-submit, since we have no way to reproduce them using our machines.

trivialfis · 2025-05-12T19:51:16Z

Asked for help again on the public mail list.

trivialfis · 2025-05-15T16:53:23Z

It looks like CRAN maintainers accepted my previous package that removed the tests.

Avoid R API calls within OpenMP region

5a8b1d6

hcho3 mentioned this pull request May 10, 2025

CRAN package warning. #11431

Open

hcho3 added 4 commits May 9, 2025 17:56

Merge remote-tracking branch 'origin/release_1.7.0' into dev_branch

210fa82

Revert "[R] Removing failing tests in 1.7. (dmlc#11438)"

c9e7f57

This reverts commit aba9812.

Bump version to 1.7.12

87a5f3c

Bump date

efd3b2a

david-cortes reviewed May 10, 2025

View reviewed changes

aitap reviewed May 10, 2025

View reviewed changes

Don't use REAL_RO, INTEGER_RO

9f063e0


		n_threads <- 2

		test_that("training with feature weights works", {

Uh oh!

[WIP] Avoid R API calls within OpenMP region #11448

Are you sure you want to change the base?

[WIP] Avoid R API calls within OpenMP region #11448

Uh oh!

Conversation

hcho3 commented May 10, 2025

Uh oh!

david-cortes commented May 10, 2025

Uh oh!

hcho3 commented May 10, 2025

Uh oh!

david-cortes commented May 10, 2025

Uh oh!

david-cortes commented May 10, 2025

Uh oh!

david-cortes May 10, 2025

Choose a reason for hiding this comment

Uh oh!

david-cortes May 10, 2025

Choose a reason for hiding this comment

Uh oh!

david-cortes May 10, 2025

Choose a reason for hiding this comment

Uh oh!

hcho3 May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aitap May 10, 2025

Choose a reason for hiding this comment

Uh oh!

trivialfis commented May 10, 2025

Uh oh!

hcho3 commented May 10, 2025

Uh oh!

trivialfis commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hcho3 commented May 11, 2025

Uh oh!

david-cortes commented May 11, 2025

Uh oh!

hcho3 commented May 11, 2025

Uh oh!

trivialfis commented May 12, 2025

Uh oh!

hcho3 commented May 12, 2025

Uh oh!

david-cortes commented May 12, 2025

Uh oh!

hcho3 commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trivialfis commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trivialfis commented May 15, 2025

Uh oh!

Uh oh!

hcho3 May 10, 2025 •

edited

Loading

trivialfis commented May 11, 2025 •

edited

Loading

hcho3 commented May 12, 2025 •

edited

Loading

trivialfis commented May 12, 2025 •

edited

Loading