-
Notifications
You must be signed in to change notification settings - Fork 285
Adding Multinomial and Nested Logit Models for Consumer Choice #1654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…erface Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1654 +/- ##
==========================================
+ Coverage 93.39% 93.40% +0.01%
==========================================
Files 56 58 +2
Lines 6329 6873 +544
==========================================
+ Hits 5911 6420 +509
- Misses 418 453 +35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Can this be done with bambi? |
Some aspects of the multinomial logit I believe so, but not the separate utility equations with fixed covariates and the nested logit cannot be done with Bambi. |
Maybe to make that a little clearer @williambdean the Multinomial logit in this discrete choice implementation is related to the more standard multinomial regression you will find in Bambi, but it differs is an important way that is rooted in the utility theory behind the modelling enterprise. The model is conceptualised as involving "drivers" of the utility for each of the products on a market, so within the model we have N linear models which represent the utility of that good - where each of the models takes attributes of the product alternative as features (rather than shared attributes). By allowing these distinct "alternative specific" and "individual specific" covariates we attempt to model a choice-scenario and the covariates have a specific interpretation under scenario. Standard multinomial regression models don't make that distinction and so can't be interpreted in the same way. Do you think it would help the PR if i put more of this background in the notebooks? |
Description
I'm adding two new model classes for discrete choice style models that I intend to be part of the consumer choice module.
As it stands i'm opening the PR as a draft for discussion around the implementation choices and API design I have for these models.
Related to this issue: #1653. There is a lot of potential in the discrete choice style models for Bayesian modelling in particular, the state of the art models in this domain involves a mixed logit parameterisation for which "vanilla" implementations are pretty straightforward using Bayesian hierarchical parameterisations.
Two New Models
Main things to flag: There are now two new model files in the consumer choice folder. The simple Multinomial Logit and the Nested Logit. As outlined in the issue i've restricted the nested logit to no more than two layers of nesting. I believe this will bring us to beyond parity with packages like
mlogit
in R andpylogit
in python which allow for only 1 level deep nesting structures.API Discussion
The API i'm suggesting for these models differs from the typical X,y inputs on the models in pymc marketing in general. Mostly this is because I feel the use of Wilkinson style notation here is important. For instance this is how you specify the Nested Logit Model currently:
We assume a wide-data input as well:
Causal Inference and Counterfactuals
The value that these models bring is their focus on causal inference. The entire history of discrete choice models stems effectively from the observation that multinomial logit models cannot support plausible counterfactuals around market interventions (due to IIA) and more sophisticated discrete choice models like the nested logit models are able to solve this. See for instance here how a pricing intervention on a multinomial logit results in proportional re-allocation of market share to the rest of the market.
We demonstrate this problem and solution by adding 2 new notebooks to the gallery.
In the second notebook for nested logit we show how the IIA is solved by this extra nesting structure:
Fixed Attributes and Alternative Specific Attributes
One thing i've done is to ensure that the models can identify parameters for the alternative specific attributes (e.g. price) and the individually fixed attributes e.g. (income). I've done my best to benchmark the parameter identification and recovery against R's
mlogit
package:How to Proceed?
I have not done an extensive write up of the math behind these types of models and some of the functions need more documentation and tests. But I wanted to share what I have so far to generate discussion and maybe decide on how to proceed. One immediate improvement i could think of would be to remove duplication from the nested logit and multinomial logit model classes, making them instances of a more general "DISCRETE CHOICE" class where we could re-use e.g. the formula parsing functions. Additionally i'd like to benchmark the parameter identification with a second data set and example.
Longer term i think there is room for adding a vanilla mixed-logit example too.
Anyway, open to feedback. Adding a draft PR now to check which linting, and testing failures i have.
Related Issue
Checklist
pre-commit.ci autofix
to auto-fix.📚 Documentation preview 📚: https://pymc-marketing--1654.org.readthedocs.build/en/1654/