Stav/remove test compare to python vm #2086

Stavbe · 2025-05-04T11:52:54Z

Initially, we planned to fill holes only in the memory used for prover_input_info, but we realized it would be better for Stone to receive a memory file without holes as well —since this they are already computed by the VM.
Therefore, we no longer intend to compare the memory against the Python VM’s memory, as they will not be identical.

This change is

github-actions · 2025-05-04T12:01:36Z

**Hyper Thereading Benchmark results**




hyperfine -r 2 -n "hyper_threading_main threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_main' -n "hyper_threading_pr threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 1
  Time (mean ± σ):     26.330 s ±  0.006 s    [User: 25.528 s, System: 0.798 s]
  Range (min … max):   26.325 s … 26.335 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 1
  Time (mean ± σ):     25.640 s ±  0.033 s    [User: 24.768 s, System: 0.868 s]
  Range (min … max):   25.616 s … 25.663 s    2 runs
 
Summary
  hyper_threading_pr threads: 1 ran
    1.03 ± 0.00 times faster than hyper_threading_main threads: 1




hyperfine -r 2 -n "hyper_threading_main threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_main' -n "hyper_threading_pr threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 2
  Time (mean ± σ):     14.610 s ±  0.084 s    [User: 25.537 s, System: 0.835 s]
  Range (min … max):   14.551 s … 14.669 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 2
  Time (mean ± σ):     13.935 s ±  0.014 s    [User: 24.843 s, System: 0.913 s]
  Range (min … max):   13.925 s … 13.945 s    2 runs
 
Summary
  hyper_threading_pr threads: 2 ran
    1.05 ± 0.01 times faster than hyper_threading_main threads: 2




hyperfine -r 2 -n "hyper_threading_main threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_main' -n "hyper_threading_pr threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 4
  Time (mean ± σ):     10.845 s ±  0.480 s    [User: 38.708 s, System: 0.984 s]
  Range (min … max):   10.505 s … 11.184 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 4
  Time (mean ± σ):     10.431 s ±  0.528 s    [User: 36.853 s, System: 1.077 s]
  Range (min … max):   10.057 s … 10.804 s    2 runs
 
Summary
  hyper_threading_pr threads: 4 ran
    1.04 ± 0.07 times faster than hyper_threading_main threads: 4




hyperfine -r 2 -n "hyper_threading_main threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_main' -n "hyper_threading_pr threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 6
  Time (mean ± σ):     10.772 s ±  0.052 s    [User: 38.666 s, System: 0.992 s]
  Range (min … max):   10.735 s … 10.809 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 6
  Time (mean ± σ):     10.070 s ±  0.212 s    [User: 37.750 s, System: 1.062 s]
  Range (min … max):    9.920 s … 10.219 s    2 runs
 
Summary
  hyper_threading_pr threads: 6 ran
    1.07 ± 0.02 times faster than hyper_threading_main threads: 6




hyperfine -r 2 -n "hyper_threading_main threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_main' -n "hyper_threading_pr threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 8
  Time (mean ± σ):     10.467 s ±  0.079 s    [User: 39.127 s, System: 1.005 s]
  Range (min … max):   10.412 s … 10.523 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 8
  Time (mean ± σ):     10.194 s ±  0.066 s    [User: 37.636 s, System: 1.086 s]
  Range (min … max):   10.147 s … 10.241 s    2 runs
 
Summary
  hyper_threading_pr threads: 8 ran
    1.03 ± 0.01 times faster than hyper_threading_main threads: 8




hyperfine -r 2 -n "hyper_threading_main threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_main' -n "hyper_threading_pr threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 16
  Time (mean ± σ):     10.417 s ±  0.114 s    [User: 39.511 s, System: 1.097 s]
  Range (min … max):   10.337 s … 10.498 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 16
  Time (mean ± σ):     10.216 s ±  0.243 s    [User: 37.826 s, System: 1.167 s]
  Range (min … max):   10.044 s … 10.388 s    2 runs
 
Summary
  hyper_threading_pr threads: 16 ran
    1.02 ± 0.03 times faster than hyper_threading_main threads: 16

github-actions · 2025-05-04T12:11:00Z

Benchmark Results for unmodified programs 🚀

Command	Mean [s]	Min [s]	Max [s]	Relative
`base big_factorial`	2.175 ± 0.021	2.157	2.215	1.00 ± 0.01
`head big_factorial`	2.169 ± 0.016	2.150	2.206	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base big_fibonacci`	2.110 ± 0.025	2.091	2.175	1.00 ± 0.02
`head big_fibonacci`	2.105 ± 0.025	2.076	2.158	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base blake2s_integration_benchmark`	7.684 ± 0.100	7.592	7.952	1.00
`head blake2s_integration_benchmark`	7.944 ± 0.060	7.883	8.036	1.03 ± 0.02

Command	Mean [s]	Min [s]	Max [s]	Relative
`base compare_arrays_200000`	2.210 ± 0.013	2.201	2.242	1.00 ± 0.01
`head compare_arrays_200000`	2.201 ± 0.007	2.190	2.216	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base dict_integration_benchmark`	1.447 ± 0.011	1.434	1.470	1.01 ± 0.01
`head dict_integration_benchmark`	1.436 ± 0.015	1.423	1.477	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base field_arithmetic_get_square_benchmark`	1.236 ± 0.006	1.223	1.246	1.00
`head field_arithmetic_get_square_benchmark`	1.240 ± 0.005	1.233	1.250	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base integration_builtins`	7.705 ± 0.041	7.639	7.755	1.00
`head integration_builtins`	7.972 ± 0.041	7.906	8.033	1.03 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base keccak_integration_benchmark`	7.927 ± 0.085	7.853	8.149	1.00
`head keccak_integration_benchmark`	8.337 ± 0.071	8.282	8.504	1.05 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base linear_search`	2.212 ± 0.014	2.199	2.244	1.00
`head linear_search`	2.227 ± 0.029	2.206	2.301	1.01 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base math_cmp_and_pow_integration_benchmark`	1.538 ± 0.009	1.523	1.550	1.00
`head math_cmp_and_pow_integration_benchmark`	1.551 ± 0.014	1.538	1.587	1.01 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base math_integration_benchmark`	1.473 ± 0.005	1.464	1.479	1.00
`head math_integration_benchmark`	1.476 ± 0.009	1.470	1.498	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base memory_integration_benchmark`	1.234 ± 0.004	1.229	1.241	1.00
`head memory_integration_benchmark`	1.240 ± 0.016	1.228	1.274	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base operations_with_data_structures_benchmarks`	1.552 ± 0.007	1.544	1.568	1.00
`head operations_with_data_structures_benchmarks`	1.580 ± 0.003	1.577	1.585	1.02 ± 0.00

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base pedersen`	537.5 ± 0.8	536.5	539.0	1.00
`head pedersen`	538.3 ± 3.2	534.7	545.9	1.00 ± 0.01

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base poseidon_integration_benchmark`	640.7 ± 7.0	634.4	658.7	1.00
`head poseidon_integration_benchmark`	658.5 ± 4.7	651.8	666.7	1.03 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base secp_integration_benchmark`	1.853 ± 0.005	1.848	1.864	1.00
`head secp_integration_benchmark`	1.878 ± 0.006	1.866	1.888	1.01 ± 0.00

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base set_integration_benchmark`	635.0 ± 2.0	632.4	639.1	1.00
`head set_integration_benchmark`	635.7 ± 7.8	629.6	655.0	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base uint256_integration_benchmark`	4.336 ± 0.036	4.301	4.429	1.00 ± 0.01
`head uint256_integration_benchmark`	4.325 ± 0.015	4.304	4.348	1.00

codecov · 2025-05-04T12:13:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.55%. Comparing base (33d75ca) to head (62792f5).

Additional details and impacted files

@@                    Coverage Diff                    @@
##           starkware-development    #2086      +/-   ##
=========================================================
- Coverage                  96.62%   96.55%   -0.08%     
=========================================================
  Files                        102      102              
  Lines                      44388    43250    -1138     
=========================================================
- Hits                       42889    41759    -1130     
+ Misses                      1499     1491       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

JulianGCalderon · 2025-05-06T17:05:37Z

Hi @Stavbe!

Is there a way to keep the memory comparison anyway? Comparing with cairo-lang is a great way for testing the Cairo VM, and ensuring that we don't break compatibility.

We came up with two solutions:

Add support for both unfilled and filled memory (maybe with a flag --unfilled-memory). That way, we can keep the memory comparisons. The flag --memory would output the filled memory (incompatible with cairo-lang).
Sync these changes with cairo-lang, so that cairo-lang also fills memory. That way we can update the VM behaviour, but still compare the results to cairo-lang.

JulianGCalderon · 2025-05-07T14:00:01Z

Another alternative could be to add a script that fills the memory holes when necessary, and leaving the VM output without holes. Would this work?

DavidLevitGurevich

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Stavbe)

Stavbe

Hi @JulianGCalderon,
We moved the hole-filling logic to run only in proof mode, so I updated this PR to perform memory comparison checks only in the other cases.

Reviewable status: 1 of 7 files reviewed, 2 unresolved discussions (waiting on @DavidLevitGurevich and @Stavbe)

vm/src/tests/compare_outputs_dynamic_layouts.sh line 197 at r3 (raw file):

    echo "Running cairo-lang with case: $case"
    cairo-run --program "$full_program" \
        --layout "dynamic" --cairo_layout_params_file "$full_layout" --proof_mode \

proof mode

vm/src/tests/compare_factorial_outputs_all_layouts.sh line 14 at r3 (raw file):

    # Run cairo_lang
    echo "Running cairo_lang with layout $layout"
    cairo-run --layout $layout --proof_mode  --program $factorial_compiled --trace_file factorial_py.trace --memory_file factorial_py.memory --air_public_input factorial_py.air_public_input --air_private_input factorial_py.air_private_input

proof mode

Stavbe changed the base branch from main to starkware-development May 4, 2025 11:53

Stavbe marked this pull request as ready for review May 4, 2025 12:26

Stavbe requested a review from DavidLevitGurevich May 4, 2025 12:26

Stavbe self-assigned this May 4, 2025

DavidLevitGurevich approved these changes May 13, 2025

View reviewed changes

Stavbe force-pushed the stav/remove_test_compare_to_python_vm branch from 62792f5 to 87648df Compare May 15, 2025 14:25

memory comparision test only for non proof mode cases

69efbc5

Stavbe force-pushed the stav/remove_test_compare_to_python_vm branch from 87648df to 69efbc5 Compare May 15, 2025 14:35

Stavbe commented May 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stav/remove test compare to python vm #2086

Stav/remove test compare to python vm #2086

Stavbe commented May 4, 2025 •

edited

Loading

github-actions bot commented May 4, 2025 •

edited

Loading

github-actions bot commented May 4, 2025 •

edited

Loading

codecov bot commented May 4, 2025

JulianGCalderon commented May 6, 2025

JulianGCalderon commented May 7, 2025

DavidLevitGurevich left a comment

Stavbe left a comment

Stav/remove test compare to python vm #2086

Are you sure you want to change the base?

Stav/remove test compare to python vm #2086

Conversation

Stavbe commented May 4, 2025 • edited Loading

github-actions bot commented May 4, 2025 • edited Loading

github-actions bot commented May 4, 2025 • edited Loading

codecov bot commented May 4, 2025

Codecov Report

JulianGCalderon commented May 6, 2025

JulianGCalderon commented May 7, 2025

DavidLevitGurevich left a comment

Choose a reason for hiding this comment

Stavbe left a comment

Choose a reason for hiding this comment

Stavbe commented May 4, 2025 •

edited

Loading

github-actions bot commented May 4, 2025 •

edited

Loading

github-actions bot commented May 4, 2025 •

edited

Loading