Skip to content

Implement benchmark scenario WeightedWorkloadOnTreeDataset #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

eric-maynard
Copy link
Contributor

@eric-maynard eric-maynard commented Apr 30, 2025

This implements a new scenario, WeightedWorkloadOnTreeDataset, that supports the configuration of multiple distributions over which to weight reads & writes against the catalog.

Compared with ReadUpdateTreeDataset, this allows us to understand how performance changes when reads or writes frequently hit the same tables.

Sampling

The distributions are defined in the config file like so:

    # Distributions for readers
    # ...
    readers = [
      { count = 8, mean = 0.3, variance = 0.0278 }
    ]

count is simply the number of threads which will sample from the distribution, while mean and variance describe the Gaussian distribution to sample from. These values are generally expected to fall between 0 and 1.0 and when they don't the distribution will be repeatedly resampled.

For an extreme example, refer to the following:
Screenshot 2025-04-30 at 1 27 43 AM

In this case, about 50% of samples should fall below 0.0 and therefore be resampled. This allows us to create highly concentrated or uniform distributions as needed.

Once a value in [0, 1] is obtained, this value is mapped to a table where 1.0 is the highest table (e.g. T_2048) in the tree dataset and 0.0 is T_0.

To help developers understand the distributions they've defined, some information is printed when the new simulation is run:

. . .

### Writer distributions ###
Summary for Distribution(2,0.7,0.0278):
  Range         | % of Samples | Visualization
  --------------|--------------|------------------
  [0.0 - 0.1) |   0.02%      | 
  [0.1 - 0.2) |   0.14%      | 
  [0.2 - 0.3) |   0.71%      | 
  [0.3 - 0.4) |   2.86%      | █
  [0.4 - 0.5) |   8.40%      | ████
  [0.5 - 0.6) |  16.36%      | ████████
  [0.6 - 0.7) |  23.44%      | ████████████
  [0.7 - 0.8) |  23.37%      | ████████████
  [0.8 - 0.9) |  16.56%      | ████████
  [0.9 - 1.0) |   8.15%      | ████

  The most frequently selected table was chosen in ~6% of samples
. . .

eric-maynard added 2 commits April 30, 2025 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants