test(primitives): Extract commonalities from market and storage-provider pallet benchmarks, and use real proof data #818

Jinxit · 2025-04-04T09:38:45Z

Description

A problem arising from using real proofs for benchmarking the storage-provider pallet is that the runtime must use the real proof verifier, which means the market pallet benchmarks break. While the setup created for storage-provider could be copied across, it feels a bit more maintainable to:

Extract commonalities and reuse them.
Create a reproducible (and deterministic) way to generate the data.

This PR attempts to move test setup code, mostly for benchmarks, which is currently duplicated between the market and storage-provider pallets. I didn't try to extract everything, I was mostly aiming to get it to work to not make an already large PR any larger. There's a new CLI command to storage-provider-client called benchmark-data to generate code, keys and proofs which are then embedded in the benchmark binary. I've also created traits for things which are common to the respective pallets' configs, such as CurrencyProvider, BalanceOf, and traits for each pallet's config itself - since that is kind of part of the (internal) interfaces of the pallets.

This is the big one, so I'm very much open to comments for improvements. Going commit by commit might not compile, but the changes are grouped on a best-effort basis to make reviewing them easier. be6fb5d is a bit of a mess though, I'll admit.

There is some overlap with what this is doing and the downloading of parameters (just download-params), but I haven't really made any attempt to unify the two approaches.

Important points for reviewers

As a side effect of creating MarketProvider and StorageProviderProvider (what a name, it's like I'm back in Java-land), the #[pallet::constant] tags have disappeared from the constants which are used by the other pallet, respectively. This means the constants are no longer a part of the pallet metadata, which I think is fine for now. The end goal should probably be to merge the two pallets anyway and move stuff back out of primitives.

Checklist

…y instead of scale encoding a vector Signed-off-by: Lucas Åström <[email protected]>

…a for benchmarks Signed-off-by: Lucas Åström <[email protected]>

…-provider configs Signed-off-by: Lucas Åström <[email protected]>

Signed-off-by: Lucas Åström <[email protected]>

…orage-provider benchmarks, and use the same real proof data in both Signed-off-by: Lucas Åström <[email protected]>

Signed-off-by: Lucas Åström <[email protected]>

th7nder

Good stuff! In this first pass I just focused on the high-level functionality and concepts, not at the trait choices/code design.

th7nder · 2025-04-04T09:58:51Z

storage-provider/client/src/commands/proofs.rs

+    let sector_size = seal_proof.sector_size();
+    let porep_params_path = params_root.join(format!("{sector_size}.{POREP_PARAMS_EXT}"));
+    let porep_params_vk_path = params_root.join(format!("{sector_size}.{POREP_VK_EXT_SCALE}"));
+    if !tokio::fs::try_exists(&porep_params_vk_path).await? {


This should fail instead of generate, we should always use those shared params that are (should) be stored on blob.

This was written before the downloading of params so it's generating a completely separate set - I'll switch it over.

Actually, thinking about this again, shouldn't it generate if they are missing? If we're using this as a command to run and then commit to the repo (or to the blob store), then this should be how that data is generated in the first place. This command shouldn't be run on developer machines, really, outside of developing the command itself.

I'd agree about the generation, but to generate them properly you need to have a huge-ass machine and then update them in lots of places. Like the repo, chain, genesis, etc.
I think this is not a trivial step and needs to be done explicitly.

th7nder · 2025-04-04T10:02:57Z

primitives/src/test_data.rs

+            // This code has been generated by `target/release/polka-storage-provider-client proofs benchmark-data --sector-size 2KiB examples/test-data-big.car`
+            SectorSize::_2KiB => BenchmarkData {
+                storage_provider_name: "//StorageProvider",
+                verifying_key: include_bytes!("../../target/bench/params/2KiB.porep.vk.scale"),


We need to store verifying key as well as generated proofs in the repo, somewhere committed. If we don't, CI will fail.
Params cannot be stored on the repo, as they're too big.

I'd suggest the workflow looks like this:

we generate the bench data for 1GiB sectors (both proofs, vks and stuff) on the beefy machine, by running the command benchmark-data

upload this as a backup to our blob storage (i'm doing it now)

commit the generate benched data to the repo (as well as the benchmarks)

Sounds good, I'll adjust the paths accordingly and add it to the download script.

th7nder · 2025-04-04T10:17:06Z

storage-provider/client/src/commands/proofs.rs

+        .join(sector_size.to_string());
+    tokio::fs::create_dir_all(&proofs_root).await?;
+
+    for sector_number in 0..MAX_SECTORS_PER_CALL {


This MAX_SECTORS_PER_CALL is probably to be adjusted. I think that with 1GiB sectors we'll be able to ProveCommit only 1 per call, PreCommit is much different, it's not that compute heavy, but still.

To generate 20 seperate proofs for 1GiB, it'll take like 5 hours (1 proof -> ~20minutes).
Do we wanna do that if we'll accept one proof for prove commit anyways?

If we're planning on lowering the max sectors (even to 1), then yeah I agree this is a good time to do it.

pallets/market/src/lib.rs

pallets/market/benchmarks/src/mock.rs

primitives/src/test_data.rs

th7nder · 2025-04-04T10:23:32Z

primitives/src/test_data.rs

+
+    pub fn load(sector_size: SectorSize) -> BenchmarkData<T> {
+        match sector_size {
+            // DO NOT MODIFY


Btw. shouldn't we require only one sector_size here to be loaded?
Like if we need benchmarks only for 1GiB/8MiB to not pollute the binary?

You mean with features or something? I mostly started like this when going back and forth between generating data and running it, it was convenient to easily swap them out.

I guess it was bothering me that we'll only use SectorSize 1GiB for benchmarks and have multiple sector sizes in this switch which are not necessary.

Yeah it's a good point, I think I'll switch it out (ha) to just returning the 1GB data as soon as I get that working.

cernicc

This benchmarking looks like a rocket science :D Left some comments. Good job figuring this out.

cernicc · 2025-04-04T11:32:39Z

storage-provider/client/src/commands/proofs.rs

 }

 fn generate_porep_params(
-    output_path: Option<PathBuf>,
+    output_path: Option<impl AsRef<Path>>,
    seal_proof: RegisteredSealProof,
 ) -> Result<(), CliError> {
    let output_path = if let Some(output_path) = output_path {


I think it is better to accept an owned type if the ownership is needed by the function logic. The caller can then decide if another allocation is needed or if the ownership can just be transferred to the function.

In this example the owned value is immediately created from the reference. This is a good indication of the point above.

It was less about not cloning and more about developer experience in an earlier commit where I was using a TempDir directly which implements AsRef<Path>. That changed along the way, and now it's just a PathBuf on both call sites.

I'm up for changing it, but I don't think performance is the right reason. The command takes minutes to run after all. But writing simpler code I can agree to.

cernicc · 2025-04-04T11:36:00Z

storage-provider/client/src/commands/proofs.rs

@@ -130,6 +144,59 @@ pub enum ProofsCommand {
        /// CID - CommR of a replica (output of `porep` command)
        comm_r: String,
    },
+    /// Generates PoRep params, PoSt params, proofs and commitments used for benchmarking.


I have a feeling like we should hide everything required for benchmarks behind some feature flag. What do you think?

I agree. At first I considered runtime-benchmarks but I'm not sure whether that's already too overloaded with the actual benchmarking code.

cernicc · 2025-04-04T11:46:21Z

primitives/src/configs.rs

+pub type BalanceOf<T> = <<T as CurrencyProvider>::Currency as Currency<
+    <T as frame_system::Config>::AccountId,
+>>::Balance;
+pub trait CurrencyProvider: frame_system::Config {


Missing new line before the trait

Jinxit added 6 commits April 4, 2025 11:03

feat(storage-provider-client): Store multiple proofs as a simple arra…

122f625

…y instead of scale encoding a vector Signed-off-by: Lucas Åström <[email protected]>

feat(storage-provider-client): Add a command for generating proof dat…

8dcc570

…a for benchmarks Signed-off-by: Lucas Åström <[email protected]>

feat(primitives): Extract and implement traits for market and storage…

8ac9607

…-provider configs Signed-off-by: Lucas Åström <[email protected]>

feat(primitives): Move deal structs from market to primitives

fb2dfa5

Signed-off-by: Lucas Åström <[email protected]>

test(primitives): Extract common functions used in both market and st…

be6fb5d

…orage-provider benchmarks, and use the same real proof data in both Signed-off-by: Lucas Åström <[email protected]>

test(primitives): Embed benchmark proof data

4241b1c

Signed-off-by: Lucas Åström <[email protected]>

Jinxit requested review from cernicc, th7nder, jmg-duarte and aidan46 April 4, 2025 09:38

Jinxit added the ready for review Review is needed label Apr 4, 2025

th7nder requested changes Apr 4, 2025

View reviewed changes

cernicc reviewed Apr 4, 2025

View reviewed changes

th7nder mentioned this pull request Apr 7, 2025

fix: change polkastore address and change keys #819

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(primitives): Extract commonalities from market and storage-provider pallet benchmarks, and use real proof data #818

test(primitives): Extract commonalities from market and storage-provider pallet benchmarks, and use real proof data #818

Jinxit commented Apr 4, 2025

th7nder left a comment

th7nder Apr 4, 2025

Jinxit Apr 4, 2025

Jinxit Apr 4, 2025 •

edited

Loading

th7nder Apr 4, 2025

th7nder Apr 4, 2025

Jinxit Apr 4, 2025

th7nder Apr 4, 2025

Jinxit Apr 4, 2025

th7nder Apr 4, 2025

Jinxit Apr 4, 2025

th7nder Apr 7, 2025

Jinxit Apr 8, 2025

cernicc left a comment

cernicc Apr 4, 2025

Jinxit Apr 7, 2025

cernicc Apr 4, 2025

Jinxit Apr 7, 2025

cernicc Apr 4, 2025

test(primitives): Extract commonalities from market and storage-provider pallet benchmarks, and use real proof data #818

Are you sure you want to change the base?

test(primitives): Extract commonalities from market and storage-provider pallet benchmarks, and use real proof data #818

Conversation

Jinxit commented Apr 4, 2025

Description

Important points for reviewers

Checklist

th7nder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jinxit Apr 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cernicc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jinxit Apr 4, 2025 •

edited

Loading