Skip to content

feat: Rust codec implementation #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
greged93 opened this issue Mar 3, 2025 · 16 comments
Open

feat: Rust codec implementation #46

greged93 opened this issue Mar 3, 2025 · 16 comments

Comments

@greged93
Copy link

greged93 commented Mar 3, 2025

This issue tracks the design and discussion for the Rust implementation of the da-codec (currently focusing on the decoding).

@greged93
Copy link
Author

greged93 commented Mar 4, 2025

I thought about how we would like the codec interface to be. My main concern when designing the interface was to avoid leaking implementation details for each codec through the interface. With this in mind, I was looking for an API that looked like:

let codec = Codec::from_spec(spec);
let decoded = codec.decode(input);

I think I can achieve this goal with a very simple implementation, without the need for traits. The implementation also stays easy to update, as extending the codec would just result in extending the enums:

pub enum Codec {
    V0,
    V1,
    V2,
    V3,
    V4,
    V5,
    V6,
    V7,
}

impl Codec {
    fn from_spec(spec: SpecId) -> Self {
        todo!()
    }

    fn decode(&self, input: CommitDataSource) -> Result<CommitData> {
        match self {
            Codec::V0 => Ok(CommitData::Chunks(decode_calldata_to_chunks(
                input
                    .as_raw_chunks()
                    .ok_or(DecodingError::IncorrectDataSource),
            )?)),
            ...
        }
    }
}

pub enum CommitDataSource {
    RawChunks(Vec<Vec<u8>>),
    RawChunksAndBlob((Vec<Vec<u8>>, Blob)),
    Blob(Blob),
}

pub enum CommitData {
    Chunks(Vec<Chunk>),
    ChunksWithL1MessageInfo(ChunksWithL1MessageInfo),
}

pub struct Chunks(Vec<Chunk>);

pub struct Chunk {
    transactions: Vec<Transaction>,
    l2_block: Block,
}

pub struct ChunksWithL1MessageInfo {
    chunk: Chunks,
    prev_l1_message_hash: B256,
    post_l1_message_hash: B256,
}

Let me know what you think @jonastheis

@frisitano
Copy link

frisitano commented Mar 4, 2025

This looks reasonable to me. If you want to modularise a little more, you could have

pub enum Codec {
    V0(CodecV0),
    V1(CodecV1),
    V2(CodecV2),
..
}

and then implement decode on the CodecV* but not a hard requirement.

@frisitano
Copy link

You also may want to consider deriving #[repr(u8)] for the u8 representation of the Codec enum to represent the version.

@jonastheis
Copy link
Contributor

My understanding of Rust is still bare minimum so I'm not sure I understand the full implications of the proposed interface.

My main concern when designing the interface was to avoid leaking implementation details for each codec through the interface.

Agreed. I think this is what we should aim for as much as possible.

Some questions:

  1. Is CommitDataSource and CommitData the same type for all codec versions? If so this is leaking information about its underlying implementation as Chunks are imo an implementation detail. In V7 on decoding side there's no notion of chunks anymore.
  2. Is from_spec == from_version?

@greged93
Copy link
Author

greged93 commented Mar 6, 2025

  1. I see, in that case I think we can rename the variants of CommitDataSource, to calldata, calldata and blob and blob? And similar for CommitData.
  2. Almost, from_spec would be given the current hardfork (which is equivalent I guess).

@jonastheis
Copy link
Contributor

  1. This might work for old batch types. For V7 a batch (and its corresponding batch hash) can only be computed with the context of all other batches submitted in the same transaction. Here the calldata is only needed for the first batch committed in a transaction. For the rest you need different input from the batches before it (in the same transaction). See here for implementation reference in l2geth.
  2. I don't think this is equivalent. For decoding we need a from_version. As the only thing we receive is a bunch of bytes and we need to pass the first byte (version byte) to know how to decode the whole thing. For encoding you could give it the hardfork, yes.

@greged93
Copy link
Author

I made some changes to the proposed layout, trying to shadow the implementation details as much as possible. Let me know what you think @jonastheis @frisitano.

// Note: we don't need this structure to be an enum for decoding, but let's keep it this way
// in case we ever implement encoding as well.
pub enum Codec {
    V0,
    V1,
    V2,
    V3,
    V4,
    V5,
    V6,
    V7,
}

impl Codec {
    fn decode<T: CommitDataSource>(input: T) -> Result<Box<dyn DaCommitPayload>, ()> {
        todo!()
    }
}

/// Values that implement the trait can provide data from a transaction's calldata or blob.
pub trait CommitDataSource {
    fn calldata(&self) -> &[u8];
    fn blob(&self, index: u8) -> Option<&[u8]>;
}

pub struct L2Block {
    transactions: Vec<Transaction>,
    context: BlockContext,
}

pub struct BlockContext {
    number: u64,
    timestamp: u64,
    base_fee: U256,
    gas_limit: u64,
    num_transactions: u16,
    num_l1_messages: u16,
}

pub trait CommitPayload {
    fn l2_blocks(&self) -> Vec<L2Block>;
}

pub trait MaybeCommitPayload {
    fn prev_l1_message_queue_hash(&self) -> Option<B256> {
        None
    }
    fn post_l1_message_queue_hash(&self) -> Option<B256> {
        None
    }
}

pub trait DaCommitPayload: CommitPayload + MaybeCommitPayload {}
impl<T> DaCommitPayload for T where T: MaybeCommitPayload + CommitPayload {}

Here was my process:

  • CommitDataSource correctly abstracts data that can be committed via calldata and/or blob(s). Depending on the version, data can be pulled in using the trait and an error can be returned in case the data is missing. Updates to the commitment (more blobs, shift data between calldata and blobs) shouldn't modify the trait (unless a whole method of posting DA is introduced).
  • L2Block: as far as I can see from the DABlock trait, a decoded block should always have transactions and some context. I have left this as an explicit structure for now, but abstracting it with a trait might provide more modularity in case the context is extended later on. Let me know what you think on this.
  • The decoded value is a trait object that implements CommitPayload and MaybeCommitPayload. CommitPayload exposes DA payload that should always be part of the committed data. MaybeCommitPayload exposes DA payload that can be part of the commit data (currently only prev_l1_message_queue_hash and post_l1_message_queue_hash). If we further update the way data is committed on the L1, we can extend the trait, providing default implementations (return None) for all decoded data and return Some when appropriate.

@frisitano
Copy link

frisitano commented Mar 13, 2025

I think this is a good solution. What's the benefit of using the trait object instead of having a concrete struct type or an enum with different variants? Doesn't the trait object limit the usage of the type downstream to the trait methods? Do we want this? Would a concrete type with the DaCommitPayload trait implemented on it be better?

@greged93
Copy link
Author

What's the benefit of using the trait object instead of having a concrete struct type or an enum with different variants?

An enum with different variants which exposes the same methods as the trait would be equivalent. That would be easier indeed and clearer as well.

@roynalnaruto
Copy link
Contributor

roynalnaruto commented Mar 14, 2025

I somehow find it a bit contradictory that DaCommitPayload must be CommitPayload and MaybeCommitPayload. I imagine if something is already CommitPayload then there is no need for it to also potentially "may be" be CommitPayload, i.e. MaybeCommitPayload. I think OptionalCommitPayload makes more sense to me, since the prev/post message queue hashes are added at a later codec version and hence offer an optional/additional commit payload that is not always available from the CommitPayload (from previous codec versions). Just a nit-pick :)

@roynalnaruto
Copy link
Contributor

roynalnaruto commented Mar 14, 2025

We have already some codec implementations in https://github.com/scroll-tech/zkvm-prover/tree/dc20b0536805159c3557eceeca0f5782159839be/crates/circuits/types/src/batch/payload (could be useful as reference to better design da-codec's structure)

@greged93
Copy link
Author

I somehow find it a bit contradictory that DaCommitPayload must be CommitPayload and MaybeCommitPayload

Yes agree, I'll follow the comment from @frisitano, I think it makes more sense to just use an enum for this instead of the trait object. I'll expose prev/post message queue hashes as methods on the enum.

@roynalnaruto
Copy link
Contributor

roynalnaruto commented Mar 14, 2025

I think there will be future codecs where there are fields also in addition to prev/post message queue hash. How would we handle such scenarios (there will be an additional scenario the moment there is another codec)? I see some difficulties in scaling the original approach as we have more and more codecs.

Either an Enum or the codec has its own type of commit payload. This could be interesting as well:

impl Codec {
    fn decode<T: CommitDataSource>(input: T) -> Result<T::CommitPayload, ()> {
        todo!()
    }
}

pub trait CommitDataSource {
    type CommitPayload: CommitPayload;
    fn calldata(&self) -> &[u8];
    fn blob(&self, index: u8) -> Option<&[u8]>;
}

EDIT:
You can define BaseCommitPayload as the minimum-required payload, i.e. Vec<L2Block> and so you will have:

pub trait CommitPayload {
    fn base_payload(&self) -> BaseCommitPayload;
    /* ... */
}

Does this sound interesting to you?

@greged93
Copy link
Author

I think there will be future codecs where there are fields also in addition to prev/post message queue hash. How would we handle such scenarios (there will be an additional scenario the moment there is another codec)?

We would need to extend the enum and add the corresponding method on the enum.

Does this sound interesting to you?

I'm just wondering if this helps when extending the codec. Let's say we have codec v8, which now also returns a random_hash:

  • With the enum solution, we would extend it with CommitPayloadV8 and add fn random_hash(&self) -> Option<B256>.
  • With the GAT, we would add the same method fn random_hash(&self) -> Option<B256> and would need to have some structure CommitPayloadV8 which implements the CommitPayload trait.

I feel like both are quite similar on the amount of code required, but I might be missing something?

@roynalnaruto
Copy link
Contributor

roynalnaruto commented Mar 19, 2025

Both are indeed quite similar. I personally prefer Self::CommitPayload over dyn DaCommitPayload.

@greged93
Copy link
Author

Both are indeed quite similar. I personally prefer Self::CommitPayload over dyn DaCommitPayload.

Oh sorry, I don't mean using dyn DaCommitPayload, I mean over using something like

pub enum CommitPayload {
   Base(Vec<L2Block>)
   ...
}

which was proposed by @frisitano

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants