Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with torch.distributed.rpc? #568

Open
dttsou opened this issue Mar 13, 2025 · 3 comments
Open

Compatibility with torch.distributed.rpc? #568

dttsou opened this issue Mar 13, 2025 · 3 comments

Comments

@dttsou
Copy link

dttsou commented Mar 13, 2025

Hi,

I have a multiprocess program with an agent and multiple observers. The processes are spawned using mp.spawn(run_worker, args, nproces.....).

In run_worker, I use torch.distributed.rpc.init_rpc() to create an rpc node for the agent and each of the observer processes. RPC manages a pool of threads within process for handling the rpc communications.

I use rpc from the agent to initiate some jobs on the observer. Then the observer returns results back to the agent also using rpc. Some async rpc, some sync rpc.

Thats my setup.

My question is, is Viztracer be able to support this setup?

Regards,
Dean

@gaogaotiantian
Copy link
Owner

I guess the easiest way is to just try it out. I don't think RPCs matter in this case, it's all about if you can spawn your process with viztracer. If multiprocessing.spawn is the only thing used, I don't see any reason why viztracer doesn't work.

@dttsou
Copy link
Author

dttsou commented Mar 14, 2025

Thanks for the response!

I just tried it with my toy program today which uses multiprocessing and RPC.

While it is able to generate a result.json, when I tried opening it up with vizviewer, perfetto seem to get stuck, and doesn't load up the trace at all. My result.json file is 138MB if that is of any help.

@gaogaotiantian
Copy link
Owner

138MB is not huge. So the http server of vizviewer is not the most stable software :) Try refreshing the webpage a few times, that normally works for me. There are a few alternatives. vizviewer --use_external_processor utilizes a native processor which accelerates the loading speed (but you'll lose the source code in the webpage). Or when your trace does not load, you can use options at the top left corner to load the trace manually (load trace from file) - that should get you everything. Webbrowser can not load file from the file system implicitly, there are security concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants