Skip to content

Where does anvil store fork's state? #3837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xphoniex opened this issue Dec 5, 2022 · 17 comments
Closed

Where does anvil store fork's state? #3837

xphoniex opened this issue Dec 5, 2022 · 17 comments
Labels
C-anvil Command: anvil T-question Type: question

Comments

@xphoniex
Copy link
Contributor

xphoniex commented Dec 5, 2022

I'm running anvil --fork-url <rpc> and first thing I noticed is that the logs don't show interactions with rpc, e.g:

$ cast balance 0x81...c7

# anvil
eth_getBalance

whereas I expect it to show something like:

# anvil
eth_getBalance
eth_getBalance (RPC https://rpc...)

regardless, I don't know where the state is stored. It seems like anvil retrieves bare minimum for the task and keeps it in-memory (?), in BlockchainDb (?).

I'm also seeing a file in .foundry/cache/rpc/goerli/8077534 which is the chain/block I'm interacting with but it's not growing in size, so I'm assuming the state has to be kept in-memory.

If it is in-memory, then I suggest we add a flag so that state data can be stored on disk. This would enable further offline state retrieval which can be useful for CI purposes (e.g. committed to repo), as well as secure isolated environments.

I'd be interested to implement it myself as well, if given some pointers on how the internals work.

P.S. please consider creating a discord server.

@rkrasiuk rkrasiuk added T-question Type: question C-anvil Command: anvil labels Dec 5, 2022
@xphoniex
Copy link
Contributor Author

@mattsse polite reminder.

@xphoniex
Copy link
Contributor Author

xphoniex commented Dec 11, 2022

Okay, I just realized that cache exists, and only gets flushed to disc when fork's block number is specified. (Setting RUST_LOG="cache=trace" helps)

However, if I disconnect my internet connection and run the anvil again, I'd get:

thread 'main' panicked at 'Failed to fetch network chain id: HTTPError(reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("...")), port: None, path: "/", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError("dns error", Custom { kind: Uncategorized, error: "failed to lookup address information: Name or service not known" })) })', /github.com/foundry-rs/foundry/anvil/src/config.rs:746:58

Obviously, any operation on provider such as provider.get_chainid() would now fail. Good news is, we have the required data stored, e.g. chain_id:

storage.json

{"meta":{"cfg_env":{"chain_id":"0x5" ...

next issue is returning a block here:

            let block = provider
                .get_block(BlockNumber::Number(fork_block_number.into()))
                .await
                .expect("Failed to get fork block");

which we can check for --offline or --fork-chain-id flag and return Default::default() instead.

Final issue is, cache compares metadata, and rejects the cache if they are not the same, before giving us access to cache.

I then realized that only a few of the fields are different:

meta = BlockchainDbMeta { cfg_env: CfgEnv { chain_id: 5, spec_id: LATEST, perf_all_precompiles_have_balance: false, perf_analyse_created_bytecodes: Analyse, limit_contract_code_size: None, memory_limit: 4294967295 }, block_env: BlockEnv { number: 8117594, coinbase: 0x0000000000000000000000000000000000000000, timestamp: 0, difficulty: 0, basefee: 1000000000, gas_limit: 30000000 }, hosts: {"x.com"} }

exis = BlockchainDbMeta { cfg_env: CfgEnv { chain_id: 5, spec_id: LATEST, perf_all_precompiles_have_balance: false, perf_analyse_created_bytecodes: Analyse, limit_contract_code_size: None, memory_limit: 4294967295 }, block_env: BlockEnv { number: 8117594, coinbase: 0x0000000000000000000000000000000000000000, timestamp: 1670783616, difficulty: 0, basefee: 151409, gas_limit: 30000000 }, hosts: {"x.com"} }

if we somehow match or skip that, then our node spins up and we can interact with it. Note that I only checked for ETH balance so far.

We can read the json file separately and substitute the values in block or maybe skip the check altogether if an offline flag (--offline or --fork-chain-id) exists in cache.rs.

Thoughts?

@mattsse
Copy link
Member

mattsse commented Dec 12, 2022

thanks for writing this down, however, I'm having a hard time parsing what the actual issue(s) are here.

I looked at the corresponding PR, but having trouble understanding the exact motivation here and how this relates to this issue.

rpc storage is flushed to disk on exit

@xphoniex
Copy link
Contributor Author

xphoniex commented Dec 12, 2022

thanks for writing this down, however, I'm having a hard time parsing what the actual issue(s) are here.

I looked at the corresponding PR, but having trouble understanding the exact motivation here and how this relates to this issue.

rpc storage is flushed to disk on exit

I'm testing contracts inside an isolated network which provides increased security.

There is no route to public internet to begin with, so the only way to test against a fork is to move/commit state files of a certain block to a repository and test against that.

Moving the state file (storage.json) alone is not enough because of the issues mentioned in the PR, and if merged, it wouldn't affect anything related to normal operations of a fork.

Does this make sense?

@mattsse
Copy link
Member

mattsse commented Dec 12, 2022

kinda but I'm still slow to understand this completely.

There is no route to public internet to begin with, so the only way to test against a fork is to move/commit state files of a certain block to a repository and test against that.

I really don't understand this, because the matching cache file to the block-number will be picked up if it exists, no?

Moving the state file (storage.json) alone is not enough because of the issues mentioned in the PR,

why is --chain-id not sufficient here?

@xphoniex
Copy link
Contributor Author

I really don't understand this, because the matching cache file to the block-number will be picked up if it exists, no?

Yes, but not before making two http(s) requests to remote endpoint, which will panic when not connected to internet:

thread 'main' panicked at 'Failed to fetch network chain id: HTTPError(reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("...")), port: None, path: "/", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError("dns error", Custom { kind: Uncategorized, error: "failed to lookup address information: Name or service not known" })) })', /github.com/foundry-rs/foundry/anvil/src/config.rs:746:58

why is --chain-id not sufficient here?

--fork-chain-id is basically synonyms with "start anvil only from offline cache, and skip meta check", whereas --chain-id has got nothing to do with a fork and could mean start a testnet with my own 0x9 id or whatever.

to give you a code example:

            let block = if self.fork_chain_id.is_some() {
                Some(Default::default())
            } else {
                provider
                .get_block(BlockNumber::Number(fork_block_number.into()))
                .await
                .expect("Failed to get fork block")
            };

you don't want to create a default block when --chain-id is set, that's only something you want to do when --fork-chain-id is set. (because with in offline-fork, you don't have access to block and don't want to retrieve it from remote).

@mattsse
Copy link
Member

mattsse commented Dec 12, 2022

okay, thanks for clearing this up. I believe I get it now.

Supportive of an offline feature,

wdyt about adding an --offline flag instead? would this have the same effect?

@xphoniex
Copy link
Contributor Author

okay, thanks for clearing this up. I believe I get it now.

Supportive of an offline feature,

wdyt about adding an --offline flag instead? would this have the same effect?

I already had added a --fork-offline flag, but removed it in favor of --fork-chain-id because:

  1. with current implementation, it's not really just "offline", it's an "offline-start".
  2. you still need to pass chain-id somehow, even if you use --offline because there's no way for anvil to pick up where to look for the config.json file. e.g. initially it'd look for it in /hardhat-env/config.json. it's a chicken-and-egg problem.

@mattsse
Copy link
Member

mattsse commented Dec 12, 2022

I see, that makes sense if you want to use the cache file of chain x but override with y, right?

but, correct me, isn't --fork-chain-id then just --offline + --chain-id ?

@xphoniex
Copy link
Contributor Author

xphoniex commented Dec 12, 2022

I see, that makes sense if you want to use the cache file of chain x but override with y, right?

Not sure what you mean. Chain number is overridden by the cache file and we only need the chain id in order to know which folder to look for config.json in the first place.

but, correct me, isn't --fork-chain-id then just --offline + --chain-id ?

Yes. let's call --offline, --local-start from now on or something like that. In case you use two flags, that'd be more verbose, e.g.:

            let (fork_block_number, fork_chain_id) =
                if let Some(fork_block_number) = self.fork_block_number {
                    let chain_id = if let Some(chain_id) = self.fork_chain_id {
                        Some(chain_id)

becomes

            let (fork_block_number, fork_chain_id) =
                if let Some(fork_block_number) = self.fork_block_number {
                    let chain_id = if self.chain_id.is_some() && self.local_start() {
                        Some(self.chain_id.clone().unwrap())

@mattsse
Copy link
Member

mattsse commented Dec 13, 2022

gotcha, thanks for guiding me through this.

I'll have a closer look at the PR in a bit but this should be a reasonable addon. thanks

@0xalecks
Copy link

0xalecks commented Jan 3, 2023

@xphoniex Did you have to do anything special (edit the storage.json file?) to get anvil to load the state?

I can get Anvil to save the state by starting with:

anvil \
--fork-url https://whatever \
--chain-id 1337 \
--fork-block-number 16328288

I see the storage file in ~/.foundry/cache/rpc/dev/16328288/storage.json

But when I restart it with:

anvil \
--fork-url https://whatever \
--chain-id 1337 \
--fork-block-number 16328288
--fork-chain-id 1

.. the state isn't there, shows blockhash 0x0...0

Did you do anything different to restore the state?

Thanks

@mattsse
Copy link
Member

mattsse commented Jan 3, 2023

the state isn't there, shows blockhash 0x0...0

can you please elaborate on this?

@0xalecks
Copy link

0xalecks commented Jan 4, 2023

can you please elaborate on this?

Anvil startup shows this in logs:

Fork
==================
Endpoint:       https://eth-mainnet.g.alchemy.com/v2/ID
Block number:   16328288
Block hash:     0x0000000000000000000000000000000000000000000000000000000000000000
Chain ID:       1337

Block hash shows an actual value when starting w/o --fork-chain-id option

@xphoniex
Copy link
Contributor Author

xphoniex commented Jan 4, 2023

The state is there, only blockhash is missing (read the whole issue and adjacent pr to know why).

Try running a command which retrieves actual state, e.g. cast balance <some_address> and report back if it's not there.

@0xalecks
Copy link

0xalecks commented Jan 4, 2023

Yes you're right, the state is there, at least in part. I think there's another issue going on as well as me not understanding things properly. I just now realized that storage.json only stores the state from upstream. If I change the state in Anvil after forking, those changes are not going to be stored in storage.json. That was my first mistaken assumption.

Having said that, contract calls don't work for some reason against the loaded storage.json state. cast balance X works, but if I try to read an ERC20 balance, it fails with EVM error PrevrandaoNotSet (and yes I read the balance previously, when saving the state file). Either way, even if this worked, it doesn't help me since I need the state with changes.

I wish --dump-state and --load-state would work when using a fork :/

Anyways, thanks for the tip!

@xphoniex
Copy link
Contributor Author

xphoniex commented Jan 4, 2023

I was not able to run anvil offline yet because of this issue. I'll revisit what you said once I can get it working, and hopefully fix contract calls as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-anvil Command: anvil T-question Type: question
Projects
None yet
Development

No branches or pull requests

4 participants