mjWarp

MuJoCo implemented in Warp.

Installing for development

git clone https://github.com/erikfrey/mjx_warp.git
cd mjx_warp
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -e .

During early development mjWarp is on the bleeding edge - you should install warp nightly:

pip install warp-lang --pre --upgrade -f https://pypi.nvidia.com/warp-lang/

Now make sure everything is working:

pytest

Should print out something like XX passed in XX.XXs at the end!

Benchmarking

Benchmark as follows:

mjx-testspeed --function=forward --is_sparse=True --mjcf=humanoid/humanoid.xml --batch_size=8192

Some relevant benchmarks on an NVIDIA GeForce RTX 4090:

forward steps / sec (smooth dynamics only)

27 dofs per humanoid, 8k batch size.

Num Humanoids	MJX	mjWarp dense	mjWarp sparse
1	7.9M	15.6M	13.7M
2	2.6M	7.4M	7.8M
3	2.2M	4.6M	5.3M
4	1.5M	3.3M	4.1M
5	1.1M	❌	3.2M

Ideas for what to try next

1. Unroll steps

In the Pure JAX benchmark, we can tell JAX to unroll some number of FK steps (in the benchmarks above, unroll=4). This has a big impact on performance. If we change unroll from 4 to 1, pure JAX performance at 8k batch drops from 50M to 33M steps/sec.

Is there some way that we can improve Warp performance in the same way? If I know ahead of time that I am going to call FK in a loop 1000 times, can I somehow inject unroll primitives?

2. Different levels of parallelism

The current approach parallelizes over body kinematic tree depth. We could go either direction: remove body parallism (fewer kernel launches), or parallelize over joints instead (more launches, more parallelism).

3. Tiling

It looks like a thing! Should we use it? Will it help?

4. Quaternions

Why oh why did Warp make quaternions x,y,z,w? In order to be obstinate I wrote my own quaternion math. Is this slower than using the Warp quaternion primitives?

5. `wp.static`

Haven't tried this at all - curious to see if it helps.

6. Other stuff?

Should I be playing with block_dim? Is my method for timing OK or did I misunderstand how wp.synchronize works? Is there something about allocating that I should be aware of? What am I not thinking of?

Name		Name	Last commit message	Last commit date
Latest commit History 254 Commits
.github/workflows		.github/workflows
mujoco/mjx		mujoco/mjx
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mjWarp

Installing for development

Benchmarking

forward steps / sec (smooth dynamics only)

Ideas for what to try next

1. Unroll steps

2. Different levels of parallelism

3. Tiling

4. Quaternions

5. `wp.static`

6. Other stuff?

About

Releases

Packages

Languages

License

Kenny-Vilella/mjx_warp

Folders and files

Latest commit

History

Repository files navigation

mjWarp

Installing for development

Benchmarking

forward steps / sec (smooth dynamics only)

Ideas for what to try next

1. Unroll steps

2. Different levels of parallelism

3. Tiling

4. Quaternions

5. wp.static

6. Other stuff?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

5. `wp.static`

Packages