Skip to content

Adding Multinomial and Nested Logit Models for Consumer Choice #1654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

NathanielF
Copy link

@NathanielF NathanielF commented Apr 27, 2025

Description

I'm adding two new model classes for discrete choice style models that I intend to be part of the consumer choice module.
As it stands i'm opening the PR as a draft for discussion around the implementation choices and API design I have for these models.

Related to this issue: #1653. There is a lot of potential in the discrete choice style models for Bayesian modelling in particular, the state of the art models in this domain involves a mixed logit parameterisation for which "vanilla" implementations are pretty straightforward using Bayesian hierarchical parameterisations.

Two New Models

Main things to flag: There are now two new model files in the consumer choice folder. The simple Multinomial Logit and the Nested Logit. As outlined in the issue i've restricted the nested logit to no more than two layers of nesting. I believe this will bring us to beyond parity with packages like mlogit in R and pylogit in python which allow for only 1 level deep nesting structures.

API Discussion

The API i'm suggesting for these models differs from the typical X,y inputs on the models in pymc marketing in general. Mostly this is because I feel the use of Wilkinson style notation here is important. For instance this is how you specify the Nested Logit Model currently:

image

We assume a wide-data input as well:

image

Causal Inference and Counterfactuals

The value that these models bring is their focus on causal inference. The entire history of discrete choice models stems effectively from the observation that multinomial logit models cannot support plausible counterfactuals around market interventions (due to IIA) and more sophisticated discrete choice models like the nested logit models are able to solve this. See for instance here how a pricing intervention on a multinomial logit results in proportional re-allocation of market share to the rest of the market.

image

We demonstrate this problem and solution by adding 2 new notebooks to the gallery.

image

In the second notebook for nested logit we show how the IIA is solved by this extra nesting structure:

image

Fixed Attributes and Alternative Specific Attributes

One thing i've done is to ensure that the models can identify parameters for the alternative specific attributes (e.g. price) and the individually fixed attributes e.g. (income). I've done my best to benchmark the parameter identification and recovery against R's mlogit package:

How to Proceed?

I have not done an extensive write up of the math behind these types of models and some of the functions need more documentation and tests. But I wanted to share what I have so far to generate discussion and maybe decide on how to proceed. One immediate improvement i could think of would be to remove duplication from the nested logit and multinomial logit model classes, making them instances of a more general "DISCRETE CHOICE" class where we could re-use e.g. the formula parsing functions. Additionally i'd like to benchmark the parameter identification with a second data set and example.

Longer term i think there is room for adding a vanilla mixed-logit example too.

Anyway, open to feedback. Adding a draft PR now to check which linting, and testing failures i have.

Related Issue

Checklist


📚 Documentation preview 📚: https://pymc-marketing--1654.org.readthedocs.build/en/1654/

Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added docs Improvements or additions to documentation tests customer choice Related to customer choice module labels Apr 27, 2025
Copy link

codecov bot commented Apr 27, 2025

Codecov Report

Attention: Patch coverage is 93.56618% with 35 lines in your changes missing coverage. Please review.

Project coverage is 93.40%. Comparing base (d41e74e) to head (f96a856).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
pymc_marketing/customer_choice/nested_logit.py 92.72% 24 Missing ⚠️
pymc_marketing/customer_choice/mnl_logit.py 94.85% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1654      +/-   ##
==========================================
+ Coverage   93.39%   93.40%   +0.01%     
==========================================
  Files          56       58       +2     
  Lines        6329     6873     +544     
==========================================
+ Hits         5911     6420     +509     
- Misses        418      453      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NathanielF NathanielF self-assigned this Apr 28, 2025
@NathanielF NathanielF marked this pull request as ready for review April 28, 2025 22:04
@williambdean
Copy link
Contributor

Can this be done with bambi?

@NathanielF
Copy link
Author

Some aspects of the multinomial logit I believe so, but not the separate utility equations with fixed covariates and the nested logit cannot be done with Bambi.

@NathanielF
Copy link
Author

NathanielF commented Apr 29, 2025

Maybe to make that a little clearer @williambdean the Multinomial logit in this discrete choice implementation is related to the more standard multinomial regression you will find in Bambi, but it differs is an important way that is rooted in the utility theory behind the modelling enterprise. The model is conceptualised as involving "drivers" of the utility for each of the products on a market, so within the model we have N linear models which represent the utility of that good - where each of the models takes attributes of the product alternative as features (rather than shared attributes). By allowing these distinct "alternative specific" and "individual specific" covariates we attempt to model a choice-scenario and the covariates have a specific interpretation under scenario.

Standard multinomial regression models don't make that distinction and so can't be interpreted in the same way.

Do you think it would help the PR if i put more of this background in the notebooks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer choice Related to customer choice module docs Improvements or additions to documentation tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants