Skip to content

FMA for Cannon updates for Go 1.23 and Kona #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions security/fma-cannon-updates-for-go-1.23-and-kona.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# [Project Name]: Failure Modes and Recovery Path Analysis

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->

- [Introduction](#introduction)
- [Failure Modes and Recovery Paths](#failure-modes-and-recovery-paths)
- [FM1: Toggles are incorrectly deployed or implemented causing features to be incorrectly toggled off](#fm1-toggles-are-incorrectly-deployed-or-implemented-causing-features-to-be-incorrectly-toggled-off)
- [FM2: Stack depth-related refactoring with new dclo/dclz instructions introduced a bug](#fm2-stack-depth-related-refactoring-with-new-dclodclz-instructions-introduced-a-bug)
- [FM3: New dclo/dclz instructions are incorrectly implemented](#fm3-new-dclodclz-instructions-are-incorrectly-implemented)
- [FM4: Incomplete Go 1.23 support (missing syscalls)](#fm4-incomplete-go-123-support-missing-syscalls)
- [FM5: eventfd or mprotect noop insufficient for Go 1.23 suppport](#fm5-eventfd-or-mprotect-noop-insufficient-for-go-123-suppport)
- [Generic items we need to take into account:](#generic-items-we-need-to-take-into-account)
- [Action Items](#action-items)
- [Audit Requirements](#audit-requirements)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

_Italics are used to indicate things that need to be replaced._

| | |
| ------------------ | -------------------------------------------------- |
| Author | Paul Dowman |
| Created at | 2025-05-02 |
| Initial Reviewers | Meredith Baxter |
| Need Approval From | Matt Solomon |
| Status | Implementing Actions |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have no actions we can do this:

Suggested change
| Status | Implementing Actions |
| Status | Final |


> [!NOTE]
> 📢 Remember:
>
> - The single approver in the “Need Approval From” must be from the Security team.
> - Maintain the “Status” property accordingly. An FMA document can have the following statuses:
> - **Draft 📝:** Doc is created but not yet ready for review.
> - **In Review 🔎:** Security is reviewing, and Engineering is iterating on the design. A checklist of action items will be created during this phase.
> - **Implementing Actions 🛫:** Security has signed off on the content of the document, including the resulting action items. Engineering is responsible for implementing the action items, and updating the checklist.
> - **Final 👍:** Security will transition the status of the document to Final once all action items are completed.

> [!TIP]
> Guidelines for writing a good analysis, and what the reviewer will look for:
>
> - Show your work: Include steps and tools for each conclusion.
> - Completeness of risks considered.
> - Include both implementation and operational failure modes
> - Provide references to support the reviewer.
> - The size of the document will likely be proportional to the project's complexity.
> - The ultimate goal of this document is to identify action items to improve the security of the project. The FMA review process can be accelerated by proactively identifying action items during the writing process.

## Introduction

This document covers updates to Cannon (Solidity and Go versions) to support Go 1.23 and to support running Kona

Below are references for this project:

- [Go 1.23 PR](https://github.com/ethereum-optimism/optimism/pull/14692)
- [New instructions for Kona PR](https://github.com/ethereum-optimism/optimism/pull/15601)
- [Add feature toggling to MIPS VM contracts PR](https://github.com/ethereum-optimism/optimism/pull/15487)

## Failure Modes and Recovery Paths

**_Use one sub-header per failure mode, so the full set of failure modes is easily scannable from the table of contents._**

### FM1: Toggles are incorrectly deployed or implemented causing features to be incorrectly toggled off

- **Description:** A [feature toggle](https://github.com/ethereum-optimism/optimism/pull/15487) was added. The contract could be deployed with the wrong version.
- **Risk Assessment:** low
- **Mitigations:**
1. The version number is checked in the constructor, and currently it's required to be 7 (the latest version) so we shouldn't be able to deploy MIPS64.sol with the wrong version.
2. This logic is fairly simple, it's just a check against the version number to enable features, so it's easy to reason about and low risk of being implemented incorrectly.
- **Detection:** We have manually reviewed for this.
- **Recovery Path(s)**: This would require a contract upgrade.

### FM2: Stack depth-related refactoring with new dclo/dclz instructions introduced a bug

- **Description:** Arguments were consolidated into a struct to avoid "stack too deep" issues.
- **Risk Assessment:** low
- **Mitigations:**
1. We have comprehensive differential testing on all VM instructions between go and solidity, which should catch any potential refactoring-related bugs. In this case, the solidity code was changed but the go code was unchanged, therefore we have confidence a bug was not introduced from the refactor.
2. This is a trivial refactoring
- **Detection:** We rely on our tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to the above about how this gets detected in production

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no generalized automatic way to catch a bug, but this refactoring was pretty trivial and has been audited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a more broad framing: for a lot of these, given that we typically don't run op-program in prod, do proofs failure modes typically surface in production in the same way you described in #279 (comment), i.e. the VM runner would catch it?

- **Recovery Path(s)**: It would require fixing the bug and upgrading the contract.

### FM3: New dclo/dclz instructions are incorrectly implemented

- **Description:** There are two new instructions, there could be a bug in the implementation. They aren't used by op-program, but would be used if we ever deployed Kona on Cannon.
- **Risk Assessment:** low
- **Mitigations:**
1. These instructions aren't emitted by the Go compiler, so behavior should not affect the VM when running op-program
2. If we ever do deploy Kona on Cannon we will do more testing, including running it on mainnet data for weeks in VM Runner.
- **Detection:** The program would crash if it used those instructions and they were incorrectly implemented.
- **Recovery Path(s)**: It would require fixing the bug and upgrading the contract.
Comment on lines +83 to +91
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other onchain changes we'd need to ship kona to prod? i.e why we are shipping these to production if we aren't going to use them in production, is this to help roll out kona incrementally in some way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these are the only changes needed. They're included here because it gives us the option to use Kona with Cannon if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly does it mean to use Kona with Cannon, does that require a governance proposal and new prestate? What does "if needed" mean, i.e. is this part of our runbooks somewhere?

(I thought Kona was intended to be used with Asterisc, so this is a generic question as to why we are shipping these instructions, since I'm not up to speed with what the kona/asterisc plan is)


### FM4: Incomplete Go 1.23 support (missing syscalls)

- **Description:** It's possible that the Go 1.23 compiler uses additional syscalls that we haven't noticed and they aren't implemented.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we typically notice the new syscalls that we need to implement? It sounds like this is empirically based on running the challenger, as opposed to e.g. go release notes giving us this info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways:

  1. vm-compat is a tool that runs in CI and detects new syscalls referenced in the op-program binary (I added this to the text).
  2. op-challenger-runner continually runs op-program against mainnet blocks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting that both ways are empirical, and there's no concrete docs we can reference to get a full list. That means pretty much every Go bump will have this failure mode?

- **Risk Assessment:** low
- **Mitigations:**
1. We have been running `op-challenger-runner` on production data for several weeks with the new VM
2. We used `vm-compat`, a tool that runs in CI and detects new syscalls referenced in the op-program binary
- **Detection:**: we will continue to watch `op-challenger-runner` and will be alerted if any mainnet blocks fail.
- **Recovery Path(s)**: It would require fixing the bug and upgrading the contract.

### FM5: eventfd or mprotect noop insufficient for Go 1.23 suppport

- **Description:** the eventfd and mprotect syscalls were implemented as a noop, because it was determined that it won't be used by op-program even though there is a reference to it in the binary.
- **Risk Assessment:** medium
- **Mitigations:**
1. We have been running op-challenger-runner on production data for several weeks with the new VM
- **Detection:** We rely on our tests.
- **Recovery Path(s)**: It would require fixing the bug and upgrading the contract.

### Generic items we need to take into account:

See [generic hardfork failure modes](./fma-generic-hardfork.md) and [generic smart contract failure modes](./fma-generic-contracts.md).
Incorporate any applicable failure modes with FMA-specific mitigations and detections directly into this document.

- [x] Check this box to confirm that these items have been considered and updated if necessary.

## Action Items

Below is what needs to be done before launch to reduce the chances of the above failure modes occurring, and to ensure they can be detected and recovered from:

- [ ] Resolve all comments on this document and incorporate them into the document itself (Assignee: document author)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and similarly to above, this

Suggested change
- [ ] Resolve all comments on this document and incorporate them into the document itself (Assignee: document author)
- [x] Resolve all comments on this document and incorporate them into the document itself (Assignee: document author)


## Audit Requirements

These changes were audited as part of [this larger Spearbit review](https://github.com/ethereum-optimism/optimism/blob/49a80f8054cf59be69624416160cad760f09c692/docs/security-reviews/2025_05-Interop-Portal-Spearbit.pdf) and [by Coinbase Protocol Security](https://github.com/ethereum-optimism/optimism/blob/49a80f8054cf59be69624416160cad760f09c692/docs/security-reviews/2025_05-Cannon-Go-Updates-Coinbase.pdf).