Skip to content

ordinal regression model type & polr engine #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 21, 2025

Conversation

corybrunson
Copy link
Owner

@corybrunson corybrunson commented Nov 4, 2024

This PR addresses #4 by introducing a single model type for ordinal regression and a single deployable engine. My thinking is that we should complete the implementation of one engine before beginning another.

Model type

The model type is ordinal_reg(), per this suggestion. However, as noted in the NEWS, this could be replaced with separate ordinal_*() types for different model structures, per this suggestion.

Engine

The model type comes with one engine, 'polr', which invokes MASS::polr(). The engine has one tuning parameter, called ordinal_link, which mimics survival_link and passed to the method parameter of polr(). The engine also provides class and prob prediction formats; confidence intervals for predictions seem not to be implemented in {MASS}. The engine is registered on load.

The ordinal_reg branch of {ordered} is coordinated with cognominal branches of {parsnip} and of {dials}. In {parsnip}, the model type is registered on load, a basic update() method is provided, and several other brief files or code chunks analogous to those for other model types are included. In {dials}, the ordinal_link parameter tuner is defined.

NB: I am not sure i successfully synchronized ordinal_link to method; in particular, the polr_engine_args tibble is a bit mysterious to me. A unit test with hyperparameter optimization needs to be written. Edit: See the example in a comment below.

Documentation

Package documentation was added to 'ordered-package.R' so that illustrative examples, including of {ordinalForest}, could be included there.

NB: I was unable to install the necessary dependencies to knit 'aaa.Rmd', so i manually wrote 'ordinal_reg_polr.md'.

@corybrunson
Copy link
Owner Author

Here is a complete analysis using the housing data from {MASS}. Note that all three fork branches must be installed, not just {ordered}. The data are disaggregated for this illustration but are a good use case for frequency-informed sampling/partitioning (without having to disaggregate).

library(tidymodels)
library(ordered)

# disaggregated data & partition
house_data <-
  MASS::housing[rep(seq(nrow(MASS::housing)), MASS::housing$Freq), -5]
house_split <- initial_split(house_data, prop = .8)
house_train <- training(house_split)
house_test <- testing(house_split)

# tunable model & analysis specification
house_rec <- recipe(Sat ~ Infl + Type + Cont, data = house_train)
house_spec <- ordinal_reg() |>
  set_engine("polr") |>
  set_args(method = tune())
house_tune <- extract_parameter_set_dials(house_spec)
(house_grid <- grid_regular(house_tune, levels = Inf))
#> # A tibble: 5 × 1
#>   method  
#>   <chr>   
#> 1 logistic
#> 2 probit  
#> 3 loglog  
#> 4 cloglog 
#> 5 cauchit

# hyperparameter (link function) optimization
house_res <- tune_grid(
  house_spec,
  preprocessor = house_rec,
  resamples = vfold_cv(house_train),
  grid = house_grid,
  metrics = metric_set(accuracy, roc_auc)
)
(house_link <- select_best(house_res, metric = "accuracy"))
#> # A tibble: 1 × 2
#>   method   .config             
#>   <chr>    <chr>               
#> 1 logistic Preprocessor1_Model1

# final fit
house_prep <- prep(house_rec)
house_final <- finalize_model(house_spec, house_link)
(house_fit <- fit(house_final, formula(house_prep), data = house_train))
#> parsnip model object
#> 
#> Call:
#> MASS::polr(formula = Sat ~ Infl + Type + Cont, data = data, method = ~"logistic")
#> 
#> Coefficients:
#>    InflMedium      InflHigh TypeApartment    TypeAtrium   TypeTerrace 
#>     0.5103368     1.2315652    -0.4973120    -0.2740917    -0.9533085 
#>      ContHigh 
#>     0.3576051 
#> 
#> Intercepts:
#>  Low|Medium Medium|High 
#>  -0.4677984   0.7202062 
#> 
#> Residual Deviance: 2803.47 
#> AIC: 2819.47

# evaluation
house_pred_class <- predict(house_fit, new_data = house_test, type = "class")
bind_cols(house_test, house_pred_class) |>
  accuracy(truth = Sat, estimate = .pred_class)
#> # A tibble: 1 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy multiclass     0.528
house_pred_prob <- predict(house_fit, new_data = house_test, type = "prob")
bind_cols(house_test, house_pred_prob) |>
  roc_auc(truth = Sat, starts_with(".pred_"))
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 roc_auc hand_till      0.652

Created on 2024-11-04 with reprex v2.1.1

@topepo
Copy link
Collaborator

topepo commented Nov 6, 2024

I'll try to review this later today. My first thought is that the bare skeleton of ordinal_reg() should live in parsnip so that they can use our "enhanced" engine documentation.

@corybrunson
Copy link
Owner Author

@topepo could this be resumed for a minimal CRAN submission in the next several months? I will join a project in June and hope to make use of this package. : )

@corybrunson corybrunson changed the base branch from main to ordinal_reg April 21, 2025 18:56
@mattwarkentin mattwarkentin merged commit 8fc08fc into corybrunson:ordinal_reg Apr 21, 2025
0 of 12 checks passed
@corybrunson corybrunson deleted the ordinal_reg branch April 21, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants