Skip to content

Refine and Stabilize EOF Support #15310

Open
@ekpyron

Description

@ekpyron

I'm creating this issue to keep a list of tasks that will need to be done after #15294 is merged before we can consider EOF support "non-experimental".
The list is non-exhaustive so far and not ordered by importance, please just edit in additional tasks (or comment) for anything still missing.

General/Miscellaneous

  • Topological ordering of code blocks (see #15633)
  • Proper errors due to exceeding EOF limits (too many containers, too long functions, too many arguments, too many immutables, too long jumps, stack overflow/underflow, unreachable code, etc.).
    • With test coverage both via Yul and via high-level Solidity code.
  • loadimmutable() on EOF.
  • msize() on EOF.
  • Unoptimized compilation on EOF.
  • EOF opcode support in gas meter.
  • Proper workarounds for cases of "out of reach" rjumps
    (e.g. using intermittent jumps, outlining or splitting out control flow using JUMPF)
  • Refactor parts of Assembly::assemble to better split into EOF/non-EOF parts (see e.g. Implementation of EOF in Solidity [DO NOT MERGE] #15294 (comment) and Implementation of EOF in Solidity [DO NOT MERGE] #15294 (comment))

Language Level Changes

Solidity:

  • Properly reject non-salted creation during Analysis (when targetting EOF)
  • Ensure proper analysis treatment of various high-level constructs with EOF, including:
    • Disallow gas in call options (#15638).
    • Disallow type(C).creationCode, type(C).runtimeCode, <address>.code, <address>.codehash, <address>.transfer(), <address>.send(), gasleft() and selfdestruct().
    • low level calls (disallow; reintroduce for EOF-low-level-calls - consider either another unified version that works for both targets or implementing the EOF-style version for legacy as well, if at all possible) As implemented now (#15559) low-level calls work for both targets.
    • Disallow abicoder v1 pragma on EOF.
  • Builtin address calculation for salted creation (compatible with both EOF and legacy) (rejected/impossible)
  • Revert on calls to uninitialized external function pointers (unless we get EXTCODETYPE and can keep the more general check against calls to accounts without code).
  • In-language construct to check for EOF (like a global bool isEOF)? Option to define both legacy and EOF specific assembly blocks? (Ideally we're fine without this, but will need to check demand from library authors)
  • Allow deploying custom subcontainers (like https://gist.github.com/frangio/940f21778f411154dc58ff818c712929)
  • Related to the above: consider if we need to disallow verbatim for targetting EOF (since we cannot perform stack height analysis) and can/need to replace with full custom subcontainers. To retain verbatim, it'd need to declare it's max stack effects, if used with EOF.
  • Decision: should auxdataloadn() be exposed in inline assembly?
  • Decision: should returncontract() be exposed in inline assembly?

Yul:

  • Expose a more complete set of Yul builtins for accessing the data section (only auxdataload() available now).
    • We have no builtin for accessing Yul data objects by name.
  • Decision: should the names of non-EOF instructions and builtins remain reserved in Yul on EOF?
  • Decision: should the names of EOF instructions and builtins become reserved in Yul on legacy?

EVM assembly:

  • Support for remaining EOF opcodes: DATALOAD, DATASIZE, DATACOPY, RETURNDATALOAD, RJUMPV, DUPN, SWAPN, EXCHANGE.

Noteworthy Documentation Entries

  • Compile a list of breaking changes, similar to what we did for IR codegen.
    • Difference in address calculation in salted creation and
    • Constructor arguments no longer affecting salted-create address.
    • EXTCODESIZE/EXTCODETYPE check and behavior of uninitialized external function pointers.
    • gas no longer allowed in call options.
    • Builtins no longer available in Solidity.
    • Opcodes/builtins no longer available in inline assembly and Yul.
    • Existence of implicit limits on some language constructs resulting from EOF size constraints.
    • msg.data in constructor no longer empty; now contains arguments.
  • Document the EOF-Yul builtins.
    • Including new restrictions in how nested objects and data can be referenced.
  • Document the EOF opcodes.
  • Check for required documentation changes on all changing output artifacts.
  • Check general introductions for legacy/EOF specific parts.

Compiler Output (and Input) Artifacts

  • Settle the details on CBOR metadata. (Where? -> Likely beginning of data; Should we tweak the format?) See List of metadata improvements sourcify#1523
    • May involve documenting that e.g. --no-cbor-metadata induces code changes (in particular dataload offsets) - similar prerelease vs release due to the length of the cbor-encoded version string.
  • Generate proper source mappings.
  • Properly define and output assembly text and assembly json for EOF.
  • After the above allow importing EOF assembly json.
  • (Minor) print JUMPDEST as NOP for EOF assembly/opcode.
  • Make sure the "immutable references" output remains accurate - and whether it can/should be reused for information within the data section or we should have a new artifact for that instead.
    • Also, restore validation against reading a non-existent immutable.

Optimizations

  • libevmasm optimizer steps: most importantly BlockDeduplicator, but also JumpdestRemover, Inliner, PeepholeOptimiser, CommonSubexpressionEliminator, ConstantOptimiser.
  • Use RJUMPV for dense Yul switches; investigate trying to turn sparse switches into dense switches (especially for the external dispatch)
  • CALLF RETF -> JUMPF peephole optimizer rule; tail-call optimization in codegen
  • Consider potentially new peephole optimizer rules (e.g. certain chains of swaps to exchange)
  • Adjust Yul inlining heuristics
  • Allow stack depth larger than 16 with SWAPN/DUPN during Yul->EVM code transform and remove all stack-to-memory logic (see eof: Make use of EOF's SWAPN/DUPN #15844)
  • Exploit EXCHANGE for stack shuffling
  • Consider creating a version of the libevmasm low-level inliner that can inline EOF function calls according. (Very low-priority - most can be expected to be done on the Yul level - would be more relevant if we backported EOF to legacy codegen)
  • Relax RETURNDATACOPY restrictions in the optimizer.
  • Exploit RETURNDATALOAD
  • Sub-assembly deduplication on EOF (Do not duplicate subassemblies. #13804). Cannot be done during assembling like in legacy, but should be doable in the optimizer.
  • Removal of unreferenced data objects.

Testing

Note: Obviously any changes in all other points need proper testing. But beyond that generally:

  • Set up fuzzing pipelines for EOF vs legacy (needs robustness against minor differences in e.g. salted creation addresses)
  • Try to unify tests (e.g. if not already done, refactor tests that depend on address calculations (e.g. move into a testing contract)); try to reduce test copies between legacy and EOF in general, whenever possible.
  • Run external tests on EOF.
  • Gas measurements in semantic tests on EOF. Consider also adding some gas tests (disabled in #15653).
  • Review correctness of properties of EOF opcodes in SemanticInformation.cpp and add more coverage via functionSideEffects tests (disabled in #15654).
  • Correct codehash calculation for EOFCREATE in EVMHost (see #15635)
  • Enable SMTChecker tests on EOF (disabled in #15659).
  • More coverage for EOF use in inline assembly.

Breaking Changes

Note: Non-backwards-compatible changes we should do even with legacy as target in the breaking release that switches to EOF per default. (EOF-specific breaking changes for EOF as target can generally be done outside of breaking releases if restricted to only affect compilation when EOF is enabled)

  • Consider renaming the previous special datacopy, etc., Yul builtins for legacy codegen.

New Language Features

Note: we don't need these to call EOF support stable, but notably a few language features become significantly easier to support with EOF, e.g.:

  • Reference-type Immutables (resp. a data location for the data section)
  • returndata as data location
  • calldata arguments in constructors
  • high-level TXCREATE support.
  • referencing contract subcontainers from inline assembly (needed for eofcreate()).

Metadata

Metadata

Assignees

No one assigned

    Labels

    EOFepichigh effortA lot to implement but still doable by a single person. The task is large or difficult.high impactChanges are very prominent and affect users or the project in a major way.selected for developmentIt's on our short-term development

    Type

    No type

    Projects

    Status

    To do

    Status

    🌱 Q2 2025

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions