Skip to content

Protection Stack Overflow for high dimensional data frames #1233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chillerb opened this issue Jan 22, 2025 · 3 comments
Closed

Protection Stack Overflow for high dimensional data frames #1233

chillerb opened this issue Jan 22, 2025 · 3 comments

Comments

@chillerb
Copy link

chillerb commented Jan 22, 2025

The problem

For large, high-dimensional data frames, model.frame(formula, data) and therefore the fit itself will fail due to a stack overflow.

Reproducible example

library(tidymodels)

set.seed(19)

n <- 10
p <- 20000

X <- matrix(rnorm(n * p), nrow=n)
colnames(X) <- paste("V", 1:p)

df <- as_tibble(X)
df$response <- rnorm(n)

glmnet_spec <- linear_reg(penalty = 0.5) %>%
  set_engine("glmnet")

fit(glmnet_spec, response ~ . + 1, df)
#> Error: protect(): protection stack overflow

Created on 2025-01-22 with reprex v2.1.1

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       Ubuntu 22.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2025-01-22
#>  pandoc   3.4 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package      * version    date (UTC) lib source
#>    backports      1.5.0      2024-05-23 [1] CRAN (R 4.4.1)
#>  P broom        * 1.0.7      2024-09-26 [?] CRAN (R 4.4.1)
#>  P class          7.3-23     2025-01-01 [?] CRAN (R 4.4.1)
#>  P cli            3.6.3      2024-06-21 [?] CRAN (R 4.4.1)
#>  P codetools      0.2-20     2024-03-31 [3] CRAN (R 4.4.1)
#>    colorspace     2.1-1      2024-07-26 [1] CRAN (R 4.4.1)
#>    data.table     1.16.4     2024-12-06 [1] CRAN (R 4.4.1)
#>  P dials        * 1.3.0      2024-07-30 [?] CRAN (R 4.4.1)
#>  P DiceDesign     1.10       2023-12-07 [?] CRAN (R 4.4.1)
#>  P digest         0.6.37     2024-08-19 [?] CRAN (R 4.4.1)
#>  P dplyr        * 1.1.4      2023-11-17 [?] CRAN (R 4.4.1)
#>  P evaluate       1.0.3      2025-01-10 [?] CRAN (R 4.4.1)
#>  P fastmap        1.2.0      2024-05-15 [?] CRAN (R 4.4.1)
#>    foreach        1.5.2      2022-02-02 [1] CRAN (R 4.4.1)
#>  P fs             1.6.5      2024-10-30 [?] CRAN (R 4.4.1)
#>  P furrr          0.3.1      2022-08-15 [?] CRAN (R 4.4.1)
#>  P future         1.34.0     2024-07-29 [?] CRAN (R 4.4.1)
#>  P future.apply   1.11.3     2024-10-27 [?] CRAN (R 4.4.1)
#>  P generics       0.1.3      2022-07-05 [?] CRAN (R 4.4.1)
#>    ggplot2      * 3.5.1      2024-04-23 [1] CRAN (R 4.4.1)
#>    glmnet         4.1-8      2023-08-22 [1] CRAN (R 4.4.1)
#>  P globals        0.16.3     2024-03-08 [?] CRAN (R 4.4.1)
#>    glue           1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
#>  P gower          1.0.2      2024-12-17 [?] CRAN (R 4.4.1)
#>  P GPfit          1.0-8      2019-02-08 [?] CRAN (R 4.4.1)
#>    gtable         0.3.6      2024-10-25 [1] CRAN (R 4.4.1)
#>  P hardhat        1.4.0      2024-06-02 [?] CRAN (R 4.4.1)
#>  P htmltools      0.5.8.1    2024-04-04 [?] CRAN (R 4.4.1)
#>  P infer        * 1.0.7      2024-03-25 [?] CRAN (R 4.4.1)
#>  P ipred          0.9-15     2024-07-18 [?] CRAN (R 4.4.1)
#>    iterators      1.0.14     2022-02-05 [1] CRAN (R 4.4.1)
#>  P knitr          1.49       2024-11-08 [?] CRAN (R 4.4.1)
#>  P lattice        0.22-6     2024-03-20 [3] CRAN (R 4.4.1)
#>  P lava           1.8.1      2025-01-12 [?] CRAN (R 4.4.1)
#>  P lhs            1.2.0      2024-06-30 [?] CRAN (R 4.4.1)
#>  P lifecycle      1.0.4      2023-11-07 [?] CRAN (R 4.4.1)
#>  P listenv        0.9.1      2024-01-29 [?] CRAN (R 4.4.1)
#>  P lubridate      1.9.4      2024-12-08 [?] CRAN (R 4.4.1)
#>  P magrittr       2.0.3      2022-03-30 [?] CRAN (R 4.4.1)
#>  P MASS           7.3-64     2025-01-04 [?] CRAN (R 4.4.1)
#>  P Matrix         1.7-1      2024-10-18 [?] CRAN (R 4.4.1)
#>  P modeldata    * 1.4.0      2024-06-19 [?] CRAN (R 4.4.1)
#>    munsell        0.5.1      2024-04-01 [1] CRAN (R 4.4.1)
#>  P nnet           7.3-20     2025-01-01 [?] CRAN (R 4.4.1)
#>  P parallelly     1.41.0     2024-12-18 [?] CRAN (R 4.4.1)
#>  P parsnip      * 1.2.1      2024-03-22 [?] CRAN (R 4.4.1)
#>    pillar         1.10.1     2025-01-07 [1] CRAN (R 4.4.1)
#>  P pkgconfig      2.0.3      2019-09-22 [?] CRAN (R 4.4.1)
#>  P prodlim        2024.06.25 2024-06-24 [?] CRAN (R 4.4.1)
#>  P purrr        * 1.0.2      2023-08-10 [?] CRAN (R 4.4.1)
#>  P R6             2.5.1      2021-08-19 [?] CRAN (R 4.4.1)
#>    Rcpp           1.0.14     2025-01-12 [1] CRAN (R 4.4.1)
#>  P recipes      * 1.1.0      2024-07-04 [?] CRAN (R 4.4.1)
#>  P reprex         2.1.1      2024-07-06 [?] CRAN (R 4.4.1)
#>    rlang          1.1.5      2025-01-17 [1] CRAN (R 4.4.1)
#>  P rmarkdown      2.29       2024-11-04 [?] CRAN (R 4.4.1)
#>  P rpart          4.1.24     2025-01-07 [?] CRAN (R 4.4.1)
#>  P rsample      * 1.2.1      2024-03-25 [?] CRAN (R 4.4.1)
#>  P rstudioapi     0.17.1     2024-10-22 [?] CRAN (R 4.4.1)
#>  P scales       * 1.3.0      2023-11-28 [?] CRAN (R 4.4.1)
#>  P sessioninfo    1.2.2      2021-12-06 [?] CRAN (R 4.4.1)
#>    shape          1.4.6.1    2024-02-23 [1] CRAN (R 4.4.1)
#>  P survival       3.8-3      2024-12-17 [?] CRAN (R 4.4.1)
#>  P tibble       * 3.2.1      2023-03-20 [?] CRAN (R 4.4.1)
#>  P tidymodels   * 1.2.0      2024-03-25 [?] CRAN (R 4.4.1)
#>  P tidyr        * 1.3.1      2024-01-24 [?] CRAN (R 4.4.1)
#>  P tidyselect     1.2.1      2024-03-11 [?] CRAN (R 4.4.1)
#>  P timechange     0.3.0      2024-01-18 [?] CRAN (R 4.4.1)
#>  P timeDate       4041.110   2024-09-22 [?] CRAN (R 4.4.1)
#>  P tune         * 1.2.1      2024-04-18 [?] CRAN (R 4.4.1)
#>  P vctrs          0.6.5      2023-12-01 [?] CRAN (R 4.4.1)
#>    withr          3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
#>  P workflows    * 1.1.4      2024-02-19 [?] CRAN (R 4.4.1)
#>  P workflowsets * 1.1.0      2024-03-21 [?] CRAN (R 4.4.1)
#>  P xfun           0.50       2025-01-07 [?] CRAN (R 4.4.1)
#>  P yaml           2.3.10     2024-07-26 [?] CRAN (R 4.4.1)
#>  P yardstick    * 1.3.1      2024-03-21 [?] CRAN (R 4.4.1)
#> 
#>  [1] /root/.cache/R/renv/library/rabaki-e6dba559/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu
#>  [2] /root/.cache/R/renv/sandbox/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu/25ebdc09
#>  [3] /usr/local/lib/R/library
#> 
#>  P ── Loaded and on-disk path mismatch.
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@topepo
Copy link
Member

topepo commented Jan 28, 2025

Surprisingly, this appears to be an R problem:

library(tidymodels)

set.seed(19)

n <- 10
p <- 20000

X <- matrix(rnorm(n * p), nrow=n)
colnames(X) <- paste("V", 1:p)

df <- as_tibble(X)
df$response <- rnorm(n)

mod_mat <- model.matrix(response ~ . + 1, df)
#> Error: protect(): protection stack overflow

Created on 2025-01-28 with reprex v2.1.0

You can get around it using the xy method:

library(tidymodels)

set.seed(19)

n <- 10
p <- 20000

X <- matrix(rnorm(n * p), nrow=n)
colnames(X) <- paste("V", 1:p)

df <- as_tibble(X)
response <- rnorm(n)

glmnet_spec <- linear_reg(penalty = 0.5) %>%
  set_engine("glmnet")

fit_xy(glmnet_spec, df, response)
#> parsnip model object
#> 
#> 
#> Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian") 
#> 
#>    Df  %Dev  Lambda
#> 1   0  0.00 1.03300
#> 2   2  8.42 0.98600
#> 3   2 16.27 0.94120
#> 4   2 23.42 0.89840
#> 5   3 29.99 0.85760
#> 6   3 36.00 0.81860
#> 7   3 41.48 0.78140
#> 8   3 46.47 0.74590
#> 9   3 51.02 0.71200
#> 10  3 55.17 0.67960
#> 11  4 58.98 0.64870
#> 12  4 62.54 0.61930
#> 13  4 65.78 0.59110
#> 14  5 68.74 0.56420
#> 15  5 71.46 0.53860
#> 16  5 73.93 0.51410
#> 17  5 76.18 0.49080
#> 18  5 78.21 0.46840
#> 19  5 80.11 0.44720
#> 20  5 81.79 0.42680
#> 21  5 83.36 0.40740
#> 22  5 84.76 0.38890
#> 23  6 86.09 0.37120
#> 24  6 87.32 0.35440
#> 25  6 88.43 0.33830
#> 26  6 89.45 0.32290
#> 27  6 90.38 0.30820
#> 28  6 91.22 0.29420
#> 29  6 91.99 0.28080
#> 30  6 92.69 0.26810
#> 31  6 93.33 0.25590
#> 32  6 93.91 0.24420
#> 33  6 94.44 0.23310
#> 34  6 94.93 0.22260
#> 35  6 95.37 0.21240
#> 36  6 95.77 0.20280
#> 37  6 96.13 0.19360
#> 38  7 96.47 0.18480
#> 39  6 96.78 0.17640
#> 40  7 97.06 0.16840
#> 41  7 97.31 0.16070
#> 42  7 97.54 0.15340
#> 43  7 97.76 0.14640
#> 44  7 97.95 0.13980
#> 45  7 98.13 0.13340
#> 46  7 98.29 0.12740
#> 47  8 98.44 0.12160
#> 48  8 98.58 0.11600
#> 49  8 98.70 0.11080
#> 50  8 98.81 0.10570
#> 51  9 98.92 0.10090
#> 52  9 99.01 0.09634
#> 53  9 99.10 0.09196
#> 54  9 99.18 0.08778
#> 55  9 99.25 0.08379
#> 56  9 99.32 0.07998
#> 57  8 99.38 0.07635
#> 58  8 99.43 0.07288
#> 59  8 99.48 0.06956
#> 60  8 99.53 0.06640
#> 61  8 99.57 0.06338
#> 62  8 99.61 0.06050
#> 63  8 99.64 0.05775
#> 64  8 99.67 0.05513
#> 65  8 99.70 0.05262
#> 66  8 99.73 0.05023
#> 67  9 99.75 0.04795
#> 68  9 99.77 0.04577
#> 69  9 99.79 0.04369
#> 70 10 99.81 0.04170
#> 71 10 99.83 0.03981
#> 72 10 99.84 0.03800
#> 73 10 99.86 0.03627
#> 74 11 99.87 0.03462
#> 75 12 99.88 0.03305
#> 76 12 99.89 0.03155
#> 77 12 99.90 0.03011

Created on 2025-01-28 with reprex v2.1.0

@topepo topepo closed this as completed Jan 28, 2025
@EmilHvitfeldt
Copy link
Member

yes, this is sadly a case where formulas don't like long formulas.

related:
tidymodels/recipes#467
tidymodels/recipes#548

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants