Implement benchmark scenario `WeightedWorkloadOnTreeDataset` #21

eric-maynard · 2025-04-30T08:28:52Z

This implements a new scenario, WeightedWorkloadOnTreeDataset, that supports the configuration of multiple distributions over which to weight reads & writes against the catalog.

Compared with ReadUpdateTreeDataset, this allows us to understand how performance changes when reads or writes frequently hit the same tables.

Sampling

The distributions are defined in the config file like so:

    # Distributions for readers
    # ...
    readers = [
      { count = 8, mean = 0.3, variance = 0.0278 }
    ]

count is simply the number of threads which will sample from the distribution, while mean and variance describe the Gaussian distribution to sample from. These values are generally expected to fall between 0 and 1.0 and when they don't the distribution will be repeatedly resampled.

For an extreme example, refer to the following:

In this case, about 50% of samples should fall below 0.0 and therefore be resampled. This allows us to create highly concentrated or uniform distributions as needed.

Once a value in [0, 1] is obtained, this value is mapped to a table where 1.0 is the highest table (e.g. T_2048) in the tree dataset and 0.0 is T_0.

To help developers understand the distributions they've defined, some information is printed when the new simulation is run:

. . .

### Writer distributions ###
Summary for Distribution(2,0.7,0.0278):
  Range         | % of Samples | Visualization
  --------------|--------------|------------------
  [0.0 - 0.1) |   0.02%      | 
  [0.1 - 0.2) |   0.14%      | 
  [0.2 - 0.3) |   0.71%      | 
  [0.3 - 0.4) |   2.86%      | █
  [0.4 - 0.5) |   8.40%      | ████
  [0.5 - 0.6) |  16.36%      | ████████
  [0.6 - 0.7) |  23.44%      | ████████████
  [0.7 - 0.8) |  23.37%      | ████████████
  [0.8 - 0.9) |  16.56%      | ████████
  [0.9 - 1.0) |   8.15%      | ████

  The most frequently selected table was chosen in ~6% of samples
. . .

…nto weighted-workloads

sfc-gh-emaynard and others added 14 commits April 29, 2025 17:50

initial commit

d494680

slicing

cbdb7e1

compiles

de53bf6

fix file

af4a733

defaults

75d553b

messing around with gradle

c0afce5

mess with gradle more

599eb04

maybe?

d1f06cc

auth changes

a6dc1ae

Fix

828767b

kinda works

84434a8

simplify code

87c0600

working

4deb6e4

add writers

a40896c

eric-maynard requested review from adutra, ashvina, dennishuo, dimas-b, jackye1995, jbonofre, vvcephei, collado-mike, snazy, RussellSpitzer, takidau, MonkeyCanCode, flyrain and ebyhr as code owners April 30, 2025 08:28

eric-maynard added 2 commits April 30, 2025 01:33

fix

18487ce

spotless

eb392ab

eric-maynard added 13 commits April 30, 2025 01:52

spotless again

8d6ad00

add summary viz

f5dacf7

polish

70cb5ea

spotless

a5d3b2a

spotless

130fe1e

spotless again

407993e

one fix

bb73724

fix

127ee67

remove header

7781e6d

empty string

2ee7bac

spotless

454e167

Merge branch 'no-etag' of github.com-oss:eric-maynard/polaris-tools i…

5b68c51

…nto weighted-workloads

disablecaching

9f85e1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement benchmark scenario `WeightedWorkloadOnTreeDataset` #21

Implement benchmark scenario `WeightedWorkloadOnTreeDataset` #21

eric-maynard commented Apr 30, 2025 •

edited

Loading

Implement benchmark scenario WeightedWorkloadOnTreeDataset #21

Are you sure you want to change the base?

Implement benchmark scenario WeightedWorkloadOnTreeDataset #21

Conversation

eric-maynard commented Apr 30, 2025 • edited Loading

Sampling

Implement benchmark scenario `WeightedWorkloadOnTreeDataset` #21

Implement benchmark scenario `WeightedWorkloadOnTreeDataset` #21

eric-maynard commented Apr 30, 2025 •

edited

Loading