-
Notifications
You must be signed in to change notification settings - Fork 10
Implement benchmark scenario WeightedWorkloadOnTreeDataset
#21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
eric-maynard
wants to merge
29
commits into
apache:main
Choose a base branch
from
eric-maynard:weighted-workloads
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nto weighted-workloads
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This implements a new scenario,
WeightedWorkloadOnTreeDataset
, that supports the configuration of multiple distributions over which to weight reads & writes against the catalog.Compared with
ReadUpdateTreeDataset
, this allows us to understand how performance changes when reads or writes frequently hit the same tables.Sampling
The distributions are defined in the config file like so:
count
is simply the number of threads which will sample from the distribution, whilemean
andvariance
describe the Gaussian distribution to sample from. These values are generally expected to fall between 0 and 1.0 and when they don't the distribution will be repeatedly resampled.For an extreme example, refer to the following:

In this case, about 50% of samples should fall below 0.0 and therefore be resampled. This allows us to create highly concentrated or uniform distributions as needed.
Once a value in [0, 1] is obtained, this value is mapped to a table where 1.0 is the highest table (e.g. T_2048) in the tree dataset and 0.0 is T_0.
To help developers understand the distributions they've defined, some information is printed when the new simulation is run: