feat: Rust codec implementation #46

greged93 · 2025-03-03T11:42:57Z

This issue tracks the design and discussion for the Rust implementation of the da-codec (currently focusing on the decoding).

greged93 · 2025-03-04T07:40:54Z

I thought about how we would like the codec interface to be. My main concern when designing the interface was to avoid leaking implementation details for each codec through the interface. With this in mind, I was looking for an API that looked like:

let codec = Codec::from_spec(spec);
let decoded = codec.decode(input);

I think I can achieve this goal with a very simple implementation, without the need for traits. The implementation also stays easy to update, as extending the codec would just result in extending the enums:

pub enum Codec {
    V0,
    V1,
    V2,
    V3,
    V4,
    V5,
    V6,
    V7,
}

impl Codec {
    fn from_spec(spec: SpecId) -> Self {
        todo!()
    }

    fn decode(&self, input: CommitDataSource) -> Result<CommitData> {
        match self {
            Codec::V0 => Ok(CommitData::Chunks(decode_calldata_to_chunks(
                input
                    .as_raw_chunks()
                    .ok_or(DecodingError::IncorrectDataSource),
            )?)),
            ...
        }
    }
}

pub enum CommitDataSource {
    RawChunks(Vec<Vec<u8>>),
    RawChunksAndBlob((Vec<Vec<u8>>, Blob)),
    Blob(Blob),
}

pub enum CommitData {
    Chunks(Vec<Chunk>),
    ChunksWithL1MessageInfo(ChunksWithL1MessageInfo),
}

pub struct Chunks(Vec<Chunk>);

pub struct Chunk {
    transactions: Vec<Transaction>,
    l2_block: Block,
}

pub struct ChunksWithL1MessageInfo {
    chunk: Chunks,
    prev_l1_message_hash: B256,
    post_l1_message_hash: B256,
}

Let me know what you think @jonastheis

frisitano · 2025-03-04T10:29:29Z

This looks reasonable to me. If you want to modularise a little more, you could have

pub enum Codec {
    V0(CodecV0),
    V1(CodecV1),
    V2(CodecV2),
..
}

and then implement decode on the CodecV* but not a hard requirement.

frisitano · 2025-03-04T11:26:37Z

You also may want to consider deriving #[repr(u8)] for the u8 representation of the Codec enum to represent the version.

jonastheis · 2025-03-06T02:02:56Z

My understanding of Rust is still bare minimum so I'm not sure I understand the full implications of the proposed interface.

My main concern when designing the interface was to avoid leaking implementation details for each codec through the interface.

Agreed. I think this is what we should aim for as much as possible.

Some questions:

Is CommitDataSource and CommitData the same type for all codec versions? If so this is leaking information about its underlying implementation as Chunks are imo an implementation detail. In V7 on decoding side there's no notion of chunks anymore.
Is from_spec == from_version?

greged93 · 2025-03-06T11:56:56Z

I see, in that case I think we can rename the variants of CommitDataSource, to calldata, calldata and blob and blob? And similar for CommitData.
Almost, from_spec would be given the current hardfork (which is equivalent I guess).

jonastheis · 2025-03-07T00:21:04Z

This might work for old batch types. For V7 a batch (and its corresponding batch hash) can only be computed with the context of all other batches submitted in the same transaction. Here the calldata is only needed for the first batch committed in a transaction. For the rest you need different input from the batches before it (in the same transaction). See here for implementation reference in l2geth.
I don't think this is equivalent. For decoding we need a from_version. As the only thing we receive is a bunch of bytes and we need to pass the first byte (version byte) to know how to decode the whole thing. For encoding you could give it the hardfork, yes.

greged93 · 2025-03-12T16:32:48Z

I made some changes to the proposed layout, trying to shadow the implementation details as much as possible. Let me know what you think @jonastheis @frisitano.

// Note: we don't need this structure to be an enum for decoding, but let's keep it this way
// in case we ever implement encoding as well.
pub enum Codec {
    V0,
    V1,
    V2,
    V3,
    V4,
    V5,
    V6,
    V7,
}

impl Codec {
    fn decode<T: CommitDataSource>(input: T) -> Result<Box<dyn DaCommitPayload>, ()> {
        todo!()
    }
}

/// Values that implement the trait can provide data from a transaction's calldata or blob.
pub trait CommitDataSource {
    fn calldata(&self) -> &[u8];
    fn blob(&self, index: u8) -> Option<&[u8]>;
}

pub struct L2Block {
    transactions: Vec<Transaction>,
    context: BlockContext,
}

pub struct BlockContext {
    number: u64,
    timestamp: u64,
    base_fee: U256,
    gas_limit: u64,
    num_transactions: u16,
    num_l1_messages: u16,
}

pub trait CommitPayload {
    fn l2_blocks(&self) -> Vec<L2Block>;
}

pub trait MaybeCommitPayload {
    fn prev_l1_message_queue_hash(&self) -> Option<B256> {
        None
    }
    fn post_l1_message_queue_hash(&self) -> Option<B256> {
        None
    }
}

pub trait DaCommitPayload: CommitPayload + MaybeCommitPayload {}
impl<T> DaCommitPayload for T where T: MaybeCommitPayload + CommitPayload {}

Here was my process:

CommitDataSource correctly abstracts data that can be committed via calldata and/or blob(s). Depending on the version, data can be pulled in using the trait and an error can be returned in case the data is missing. Updates to the commitment (more blobs, shift data between calldata and blobs) shouldn't modify the trait (unless a whole method of posting DA is introduced).
L2Block: as far as I can see from the DABlock trait, a decoded block should always have transactions and some context. I have left this as an explicit structure for now, but abstracting it with a trait might provide more modularity in case the context is extended later on. Let me know what you think on this.
The decoded value is a trait object that implements CommitPayload and MaybeCommitPayload. CommitPayload exposes DA payload that should always be part of the committed data. MaybeCommitPayload exposes DA payload that can be part of the commit data (currently only prev_l1_message_queue_hash and post_l1_message_queue_hash). If we further update the way data is committed on the L1, we can extend the trait, providing default implementations (return None) for all decoded data and return Some when appropriate.

frisitano · 2025-03-13T06:43:03Z

I think this is a good solution. What's the benefit of using the trait object instead of having a concrete struct type or an enum with different variants? Doesn't the trait object limit the usage of the type downstream to the trait methods? Do we want this? Would a concrete type with the DaCommitPayload trait implemented on it be better?

greged93 · 2025-03-13T08:28:20Z

What's the benefit of using the trait object instead of having a concrete struct type or an enum with different variants?

An enum with different variants which exposes the same methods as the trait would be equivalent. That would be easier indeed and clearer as well.

roynalnaruto · 2025-03-14T16:54:01Z

I somehow find it a bit contradictory that DaCommitPayload must be CommitPayload and MaybeCommitPayload. I imagine if something is already CommitPayload then there is no need for it to also potentially "may be" be CommitPayload, i.e. MaybeCommitPayload. I think OptionalCommitPayload makes more sense to me, since the prev/post message queue hashes are added at a later codec version and hence offer an optional/additional commit payload that is not always available from the CommitPayload (from previous codec versions). Just a nit-pick :)

roynalnaruto · 2025-03-14T17:00:23Z

We have already some codec implementations in https://github.com/scroll-tech/zkvm-prover/tree/dc20b0536805159c3557eceeca0f5782159839be/crates/circuits/types/src/batch/payload (could be useful as reference to better design da-codec's structure)

greged93 · 2025-03-14T17:04:06Z

I somehow find it a bit contradictory that DaCommitPayload must be CommitPayload and MaybeCommitPayload

Yes agree, I'll follow the comment from @frisitano, I think it makes more sense to just use an enum for this instead of the trait object. I'll expose prev/post message queue hashes as methods on the enum.

roynalnaruto · 2025-03-14T17:09:18Z

I think there will be future codecs where there are fields also in addition to prev/post message queue hash. How would we handle such scenarios (there will be an additional scenario the moment there is another codec)? I see some difficulties in scaling the original approach as we have more and more codecs.

Either an Enum or the codec has its own type of commit payload. This could be interesting as well:

impl Codec {
    fn decode<T: CommitDataSource>(input: T) -> Result<T::CommitPayload, ()> {
        todo!()
    }
}

pub trait CommitDataSource {
    type CommitPayload: CommitPayload;
    fn calldata(&self) -> &[u8];
    fn blob(&self, index: u8) -> Option<&[u8]>;
}

EDIT:
You can define BaseCommitPayload as the minimum-required payload, i.e. Vec<L2Block> and so you will have:

pub trait CommitPayload {
    fn base_payload(&self) -> BaseCommitPayload;
    /* ... */
}

Does this sound interesting to you?

greged93 · 2025-03-14T17:36:17Z

I think there will be future codecs where there are fields also in addition to prev/post message queue hash. How would we handle such scenarios (there will be an additional scenario the moment there is another codec)?

We would need to extend the enum and add the corresponding method on the enum.

Does this sound interesting to you?

I'm just wondering if this helps when extending the codec. Let's say we have codec v8, which now also returns a random_hash:

With the enum solution, we would extend it with CommitPayloadV8 and add fn random_hash(&self) -> Option<B256>.
With the GAT, we would add the same method fn random_hash(&self) -> Option<B256> and would need to have some structure CommitPayloadV8 which implements the CommitPayload trait.

I feel like both are quite similar on the amount of code required, but I might be missing something?

roynalnaruto · 2025-03-19T14:46:32Z

Both are indeed quite similar. I personally prefer Self::CommitPayload over dyn DaCommitPayload.

greged93 · 2025-03-19T14:56:54Z

Both are indeed quite similar. I personally prefer Self::CommitPayload over dyn DaCommitPayload.

Oh sorry, I don't mean using dyn DaCommitPayload, I mean over using something like

pub enum CommitPayload {
   Base(Vec<L2Block>)
   ...
}

which was proposed by @frisitano

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Rust codec implementation #46

feat: Rust codec implementation #46

greged93 commented Mar 3, 2025

greged93 commented Mar 4, 2025

Uh oh!

frisitano commented Mar 4, 2025 •

edited

Loading

Uh oh!

frisitano commented Mar 4, 2025

Uh oh!

jonastheis commented Mar 6, 2025

Uh oh!

greged93 commented Mar 6, 2025 •

edited

Loading

Uh oh!

jonastheis commented Mar 7, 2025

Uh oh!

greged93 commented Mar 12, 2025

Uh oh!

frisitano commented Mar 13, 2025 •

edited

Loading

Uh oh!

greged93 commented Mar 13, 2025

Uh oh!

roynalnaruto commented Mar 14, 2025 •

edited

Loading

Uh oh!

roynalnaruto commented Mar 14, 2025 •

edited

Loading

Uh oh!

greged93 commented Mar 14, 2025

Uh oh!

roynalnaruto commented Mar 14, 2025 •

edited

Loading

Uh oh!

greged93 commented Mar 14, 2025

Uh oh!

roynalnaruto commented Mar 19, 2025 •

edited

Loading

Uh oh!

greged93 commented Mar 19, 2025

Uh oh!

feat: Rust codec implementation #46

feat: Rust codec implementation #46

Comments

greged93 commented Mar 3, 2025

greged93 commented Mar 4, 2025

Uh oh!

frisitano commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frisitano commented Mar 4, 2025

Uh oh!

jonastheis commented Mar 6, 2025

Uh oh!

greged93 commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonastheis commented Mar 7, 2025

Uh oh!

greged93 commented Mar 12, 2025

Uh oh!

frisitano commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greged93 commented Mar 13, 2025

Uh oh!

roynalnaruto commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roynalnaruto commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greged93 commented Mar 14, 2025

Uh oh!

roynalnaruto commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greged93 commented Mar 14, 2025

Uh oh!

roynalnaruto commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greged93 commented Mar 19, 2025

Uh oh!

frisitano commented Mar 4, 2025 •

edited

Loading

greged93 commented Mar 6, 2025 •

edited

Loading

frisitano commented Mar 13, 2025 •

edited

Loading

roynalnaruto commented Mar 14, 2025 •

edited

Loading

roynalnaruto commented Mar 14, 2025 •

edited

Loading

roynalnaruto commented Mar 14, 2025 •

edited

Loading

roynalnaruto commented Mar 19, 2025 •

edited

Loading