Skip to content

Allow user to track specific varnames #846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Mar 18, 2025

Summary

This implements the suggestion in #845 by:

  • adding a field on DynamicPPL.Model, called tracked_varnames
  • adding an exported method, DynamicPPL.set_tracked_varnames, which allows the user to set this field and thus specify which varnames are to be tracked;
  • modifying values_as_in_model to use tracked_varnames appropriately.

It also separates the tests for values_as_in_model into their own file (previously they were scattered in test/model.jl and test/compiler.jl)

I tested this PR together with the version of Turing.jl in TuringLang/Turing.jl#2487 and can confirm that this works as intended:

using Turing
@model function mymodel()
    x ~ Normal()
    y ~ Normal(x, 1)
    z := x + y
end

model = mymodel()
chn1 = sample(model, NUTS(), 100)
# ^ this chain will have x, y, and z

model = set_tracked_varnames(model, [@varname(y)])

chn2 = sample(model, NUTS(), 100)
# ^ this chain will only have y and z

model = set_tracked_varnames(model, nothing)
chn3 = sample(model, NUTS(), 100)
# ^ this chain will have x, y, and z again 

Possible alternatives: dispatch

In #845 I suggested using a different dispatch-based method to control which varnames were tracked:

@model mymodel() = ...

DynamicPPL.tracked_varnames(::Model{typeof(mymodel)}) = [@varname(x), @varname(y), ...]

The thing I don't like about this is that it's not possible (or at best, awkward) to run the same model twice while collecting different variables, as it would depend on whether the method is defined or not. Having it as a field on the model is much cleaner.

Possible alternatives: kwarg to sample

The cleanest user interface would probably be to add a keyword argument to sample. This is, of course, very complicated, since sample lives in AbstractMCMC and it's not obvious that varnames are general enough to merit appearing there. In my opinion this will have to wait for AbstractMCMC to be refactored in a way that it can take a generic type for 'parameter names', and then DynamicPPL can set that type to be VarName. See also TuringLang/Turing.jl#2511.

However, the fact that values_as_in_model also takes an argument would allow us to fairly easily implement this on the DynamicPPL end when the time comes, because the keyword argument to sample can just be passed through the chain of calls.

TODOs

  • Make code changes
  • Add tests
  • Get an opinion on whether this approach makes sense
  • Update HISTORY.md
  • Update docs

Closes #845

@penelopeysm penelopeysm changed the base branch from main to breaking March 18, 2025 01:01
@penelopeysm penelopeysm force-pushed the py/track-specific-varnames branch from a677b07 to bb8adb7 Compare March 18, 2025 01:05
Copy link

codecov bot commented Mar 18, 2025

Codecov Report

Attention: Patch coverage is 76.19048% with 5 lines in your changes missing coverage. Please review.

Project coverage is 84.56%. Comparing base (061acbe) to head (67b0e3f).
Report is 17 commits behind head on breaking.

Files with missing lines Patch % Lines
src/values_as_in_model.jl 66.66% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           breaking     #846      +/-   ##
============================================
- Coverage     84.60%   84.56%   -0.05%     
============================================
  Files            34       34              
  Lines          3832     3841       +9     
============================================
+ Hits           3242     3248       +6     
- Misses          590      593       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@penelopeysm penelopeysm force-pushed the py/track-specific-varnames branch 2 times, most recently from 4206707 to 62e4ed5 Compare March 18, 2025 10:46
@penelopeysm penelopeysm requested review from mhauru and torfjelde March 18, 2025 10:50
@penelopeysm
Copy link
Member Author

penelopeysm commented Mar 18, 2025

@torfjelde @mhauru I primarily wanted to get your thoughts on whether this makes sense, but if you think it does, then the PR is also ready for proper code review :)

(Separate question: should we also allow users to specify symbols (which we can convert to VarNames easily)?)

@penelopeysm penelopeysm marked this pull request as ready for review March 18, 2025 10:53
@penelopeysm penelopeysm force-pushed the py/track-specific-varnames branch from f8114d6 to 06d6e4e Compare March 18, 2025 10:54
@penelopeysm penelopeysm self-assigned this Mar 18, 2025
@mhauru
Copy link
Member

mhauru commented Mar 19, 2025

It does feel like which variables to collect into a chain should rightfully be the concern of sample and not the Model itself, but I do see that it is much easier to implement it this way. I'm a bit torn on what to do, but I'm not terribly offended by having it in the Model for convenience's sake.

I'll hold back from reviewing the implementation until others have commented on the design and interface.

@torfjelde
Copy link
Member

I'm personally a bit worried about opening the can of worms that is "more fields in Model" 😬

Is there a reason why this isn't just part of the values_as_in_model method itself? And in that case, you could ofc pass it down from sample.

@penelopeysm
Copy link
Member Author

Is there a reason why this isn't just part of the values_as_in_model method itself

I did already add it to part of the method signature. However, if it's restricted to just that, then there's no clear way for the user to control it, since the user doesn't have an easy hook into values_as_in_model. They would basically have to duplicate our definition of getparams(model, vi) and change the call to values_as_in_model.

The main alternative I see would be to change the signature of getparams to also take desired_params (which would be passed down from sample).

@penelopeysm
Copy link
Member Author

The main alternative I see would be to change the signature of getparams to also take desired_params (which would be passed down from sample).

But even this doesn't give the user an easy way to hook into it, unless we also go the full distance and implement the keyword argument to sample, which is going to take a long time.

Basically the question is how can we give the user a way to specify the varnames to keep, that doesn't involve adding a kwarg to sample. Maybe the answer is that we shouldn't, and we should only do it the right way by adding the kwarg. That's for your consideration, I suppose. 😄

@yebai
Copy link
Member

yebai commented Mar 21, 2025

Can we encode these in the model's embedded context instead? We already use that to implement condition/fix.

@penelopeysm
Copy link
Member Author

penelopeysm commented Mar 22, 2025

encode these in the context

We could, but what benefit do you see in doing that compared to having a separate field? It might seem tidier when looking at the definition of the Model struct, but under the hood there's the same amount of complexity. Also imo it doesn't really belong to the context (for example the tilde pipeline doesn't need that information when evaluating the model).

@torfjelde
Copy link
Member

Basically the question is how can we give the user a way to specify the varnames to keep, that doesn't involve adding a kwarg to sample. Maybe the answer is that we shouldn't, and we should only do it the right way by adding the kwarg. That's for your consideration, I suppose. 😄

Haha, yeah basically 😬
I think I find it non-intuitive to add this to the Model, given that it's only something that is relevant when you call sample 🤔

@penelopeysm
Copy link
Member Author

ok, let's do it properly then, it'll probably take a couple of months 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restrict values_as_in_model to specific varnames?
4 participants