-
Notifications
You must be signed in to change notification settings - Fork 92
Protection Stack Overflow for high dimensional data frames #1233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Surprisingly, this appears to be an R problem: library(tidymodels)
set.seed(19)
n <- 10
p <- 20000
X <- matrix(rnorm(n * p), nrow=n)
colnames(X) <- paste("V", 1:p)
df <- as_tibble(X)
df$response <- rnorm(n)
mod_mat <- model.matrix(response ~ . + 1, df)
#> Error: protect(): protection stack overflow Created on 2025-01-28 with reprex v2.1.0 You can get around it using the xy method: library(tidymodels)
set.seed(19)
n <- 10
p <- 20000
X <- matrix(rnorm(n * p), nrow=n)
colnames(X) <- paste("V", 1:p)
df <- as_tibble(X)
response <- rnorm(n)
glmnet_spec <- linear_reg(penalty = 0.5) %>%
set_engine("glmnet")
fit_xy(glmnet_spec, df, response)
#> parsnip model object
#>
#>
#> Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian")
#>
#> Df %Dev Lambda
#> 1 0 0.00 1.03300
#> 2 2 8.42 0.98600
#> 3 2 16.27 0.94120
#> 4 2 23.42 0.89840
#> 5 3 29.99 0.85760
#> 6 3 36.00 0.81860
#> 7 3 41.48 0.78140
#> 8 3 46.47 0.74590
#> 9 3 51.02 0.71200
#> 10 3 55.17 0.67960
#> 11 4 58.98 0.64870
#> 12 4 62.54 0.61930
#> 13 4 65.78 0.59110
#> 14 5 68.74 0.56420
#> 15 5 71.46 0.53860
#> 16 5 73.93 0.51410
#> 17 5 76.18 0.49080
#> 18 5 78.21 0.46840
#> 19 5 80.11 0.44720
#> 20 5 81.79 0.42680
#> 21 5 83.36 0.40740
#> 22 5 84.76 0.38890
#> 23 6 86.09 0.37120
#> 24 6 87.32 0.35440
#> 25 6 88.43 0.33830
#> 26 6 89.45 0.32290
#> 27 6 90.38 0.30820
#> 28 6 91.22 0.29420
#> 29 6 91.99 0.28080
#> 30 6 92.69 0.26810
#> 31 6 93.33 0.25590
#> 32 6 93.91 0.24420
#> 33 6 94.44 0.23310
#> 34 6 94.93 0.22260
#> 35 6 95.37 0.21240
#> 36 6 95.77 0.20280
#> 37 6 96.13 0.19360
#> 38 7 96.47 0.18480
#> 39 6 96.78 0.17640
#> 40 7 97.06 0.16840
#> 41 7 97.31 0.16070
#> 42 7 97.54 0.15340
#> 43 7 97.76 0.14640
#> 44 7 97.95 0.13980
#> 45 7 98.13 0.13340
#> 46 7 98.29 0.12740
#> 47 8 98.44 0.12160
#> 48 8 98.58 0.11600
#> 49 8 98.70 0.11080
#> 50 8 98.81 0.10570
#> 51 9 98.92 0.10090
#> 52 9 99.01 0.09634
#> 53 9 99.10 0.09196
#> 54 9 99.18 0.08778
#> 55 9 99.25 0.08379
#> 56 9 99.32 0.07998
#> 57 8 99.38 0.07635
#> 58 8 99.43 0.07288
#> 59 8 99.48 0.06956
#> 60 8 99.53 0.06640
#> 61 8 99.57 0.06338
#> 62 8 99.61 0.06050
#> 63 8 99.64 0.05775
#> 64 8 99.67 0.05513
#> 65 8 99.70 0.05262
#> 66 8 99.73 0.05023
#> 67 9 99.75 0.04795
#> 68 9 99.77 0.04577
#> 69 9 99.79 0.04369
#> 70 10 99.81 0.04170
#> 71 10 99.83 0.03981
#> 72 10 99.84 0.03800
#> 73 10 99.86 0.03627
#> 74 11 99.87 0.03462
#> 75 12 99.88 0.03305
#> 76 12 99.89 0.03155
#> 77 12 99.90 0.03011 Created on 2025-01-28 with reprex v2.1.0 |
yes, this is sadly a case where formulas don't like long formulas. |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
The problem
For large, high-dimensional data frames,
model.frame(formula, data)
and therefore the fit itself will fail due to a stack overflow.Reproducible example
Created on 2025-01-22 with reprex v2.1.1
Session info
The text was updated successfully, but these errors were encountered: