Skip to content

feat: optimize map_abi_data #3697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2025
Merged

Conversation

BobTheBuidler
Copy link
Contributor

@BobTheBuidler BobTheBuidler commented May 14, 2025

What was wrong?

Title says it all, this is ready for review.

Related to Issue #
Closes #

How was it fixed?

Todo:

  • Clean up commit history
  • Add or update documentation related to these changes
  • Add entry to the release notes

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

Copy link
Collaborator

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! There are a couple breaking typing changes here that I listed, and a quick benchmark of test_map_abi_data shows minimal improvement on only one test case and either equal or worse performance with the other 3, so I'm not quite convinced on this one. If you can show me data that this is more performant though I'd be happy to revisit!

@BobTheBuidler
Copy link
Contributor Author

BobTheBuidler commented May 15, 2025

How are you benchmarking? Is there a standard method that should be used?

The current code does exactly the same stuff as the new code, same func calls and all, but also creates and populates two lists and creates and iterates thru an intermediary chain object which the new code does not do.

I didn't anticipate you'd see a substantial difference, since this is a microoptimization after all, but a slowdown doesn't make any sense to me when the new PR is "call all the same functions with all the same inputs, minus a few that didn't need calling". We didn't modify the instructions we're sending to the cpu, we just removed a couple.

I'd love to look into this further to figure out what's going on.

@BobTheBuidler
Copy link
Contributor Author

BobTheBuidler commented May 16, 2025

I did this benchmark with empty args so we're truly comparing apples to apples and not polluting the results with the time it takes to actually format all the data

from time import time
import itertools
from functools import partial
from web3._utils.abi import abi_data_tree, data_tree_map, recursive_map, strip_abi_type
from cytoolz import pipe


def old(data, types, normalizers):
    pipeline = itertools.chain(
        [abi_data_tree(types)],
        map(data_tree_map, normalizers),
        [partial(recursive_map, strip_abi_type)],
    )

    return pipe(data, *pipeline)

def new(data, types, normalizers):
    return pipe(
        data,
        # 1. Decorating the data tree with types
        abi_data_tree(types),
        # 2. Recursively mapping each of the normalizers to the data
        *map(data_tree_map, normalizers),
        # 3. Stripping the types back out of the tree
        strip_abi_types,
    )

def strip_abi_types(elements):
    return recursive_map(strip_abi_type, elements)



for i in range(5):
    normalizers = (int,) * i

    start = time()
    for _ in range(100_000):
        old((), (), normalizers)
    print(f"old: {time() - start}")

    start = time()
    for _ in range(100_000):
        new((), (), normalizers)
    print(f"new: {time() - start}")
old: 1.5397992134094238
new: 1.442474603652954
old: 2.885958194732666
new: 2.764631509780884
old: 4.140100479125977
new: 4.019676208496094
old: 5.37146520614624
new: 4.730041265487671
old: 5.9497997760772705
new: 5.8652191162109375

Copy link
Collaborator

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any benchmarking at this granular of a level. I just grabbed the test cases from this test and then ran the each test case 1000 times and compared main to this branch. Here's the script I was using:

import time
from typing import List, Tuple, Any, Callable
from web3._utils.abi import map_abi_data
from web3._utils.normalizers import BASE_RETURN_NORMALIZERS, addresses_checksummed, abi_string_to_text

def benchmark_map_abi_data():
    test_cases = [
        # Case 1: Simple bool array and int
        {
            "types": ["bool[2]", "int256"],
            "data": [[True, False], 9876543210],
            "funcs": [
                lambda typ, dat: ((typ, "Tru-dat") if typ == "bool" and dat else (typ, dat)),
                lambda typ, dat: (typ, hex(dat)) if typ == "int256" else (typ, dat),
            ],
        },
        # Case 2: Address normalization
        {
            "types": ["address"],
            "data": ["0x5b2063246f2191f18f2675cedb8b28102e957458"],
            "funcs": BASE_RETURN_NORMALIZERS,
        },
        # Case 3: Address array normalization
        {
            "types": ["address[]"],
            "data": [["0x5b2063246f2191f18f2675cedb8b28102e957458"] * 2],
            "funcs": BASE_RETURN_NORMALIZERS,
        },
        # Case 4: Complex tuple with addresses
        {
            "types": ["(address,address)[]"],
            "data": [[
                (
                    "0x5b2063246f2191f18f2675cedb8b28102e957458",
                    "0xebe0da78ecb266c7ea605dc889c64849f860383f",
                )
            ] * 2],
            "funcs": BASE_RETURN_NORMALIZERS,
        },
        # Case 5: String and address array
        {
            "types": ["(string,address[])"],
            "data": [(
                b"a string",
                [b"\xf2\xe2F\xbbv\xdf\x87l\xef\x8b8\xae\x84\x13\x0fOU\xde9["],
            )],
            "funcs": [addresses_checksummed, abi_string_to_text],
        },
    ]

    for i, test_case in enumerate(test_cases, 1):
        # Warm up
        for _ in range(100):
            map_abi_data(test_case['funcs'], test_case['types'], test_case['data'])
        
        # Actual benchmark
        start_time = time.time()
        for _ in range(1000):
            result = map_abi_data(test_case['funcs'], test_case['types'], test_case['data'])
        end_time = time.time()
        
        total_time = end_time - start_time
        print(f"Total time for 1000 runs: {total_time:.4f} seconds")

if __name__ == "__main__":
    benchmark_map_abi_data()

Here is the data I was seeing:

Test Case Types Main Branch Run Time (s) Feature Branch Run Time Yesterday (s) Feature Branch Run Time Today (Run 1) (s) Feature Branch Run Time Today (Run 2) (s)
1 ['bool[2]', 'int256'] 0.0933 0.0668 0.0907 0.0972
2 ['address'] 0.0316 0.0321 0.0317 0.0318
3 ['address[]'] 0.0604 0.0607 0.0602 0.0604
4 ['(address,address)[]'] 0.1217 0.1268 0.1204 0.1200
5 ['(string,address[])'] 0.0855 0.0862 0.0898 0.0847

Running a few more times on this branch today, I'm seeing slight improvements on the feature branch numbers above. I think the type changes are good, and I think the performance change is minimal but doesn't impact readability so I'm okay merging but curious if you have an opinion @fselmo?

Copy link
Collaborator

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if you have an opinion fselmo?

I like it. Seems to be a slight improvement and doesn't hurt readability. I would certainly like for us to squash the 7 commits on a +24 -19 before merging though!

Copy link
Collaborator

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BobTheBuidler!

@kclowes
Copy link
Collaborator

kclowes commented May 16, 2025

Will squash and merge!

@kclowes kclowes merged commit 8a7bef1 into ethereum:main May 16, 2025
85 checks passed
@BobTheBuidler BobTheBuidler deleted the map-abi-data branch May 16, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants