What's the 'no_compression' attribute means? #11

Kangfei · 2021-04-25T11:15:29Z

Hi,

I want to inject new datasets and what's the no_compression attribute means? Why we need to set all attributes to no_compression by default?

Best,
Kangfei

ghost · 2021-04-28T14:01:58Z

DeepDB by default applies compression techniques in the leaf nodes to reduce the storage overhead. Usually, this does not hurt the accuracy for numerical columns. However, if there are numerical columns with only a few distinct values or if you want to apply equality predicateson numerical columns, this compression can result in a performance regression. Hence, in these cases you should include this column into the no_compression columns.

Kangfei · 2021-04-30T09:17:57Z

Thanks for your reply.
Another question is about getting the confident interval. I noticed that it needs an input ground truth file, like the 'confidence_interval_10M.pkl' in the below command. What's the content in this file? Whether there is a 'ground truth confidence interval' in it?

python3 maqp.py --evaluate_confidence_intervals
--dataset flights1B
--target_path ./baselines/aqp/results/deepDB/flights1B_confidence_intervals.csv
--ensemble_location ../mqp-data/flights-benchmark/spn_ensembles/ensemble_single_flights1B_10000000.pkl
--query_file_location ./benchmarks/flights/sql/aqp_queries.sql
--ground_truth_file_location ./benchmarks/flights/confidence_intervals/confidence_interval_10M.pkl
--confidence_upsampling_factor 100
--confidence_sample_size 10000000

Best,
Kangfei

ghost · 2021-05-06T11:33:32Z

This code evaluates the quality of the confidence intervals predicted by DeepDB. For an evaluation, you of course need the ground truth to compare the predicted intervals. In this case, the ground truth are simply the computed confidence intervals given a sample using standard statistical methods. The content of the files can also be found in this repository. We have also included the code that generates these files (cf. Command below "Optional: Create the ground truth for confidence interval. " in the repository).

ghost closed this as completed Jun 8, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the 'no_compression' attribute means? #11

What's the 'no_compression' attribute means? #11

Kangfei commented Apr 25, 2021

ghost commented Apr 28, 2021

Kangfei commented Apr 30, 2021 •

edited

Loading

ghost commented May 6, 2021

What's the 'no_compression' attribute means? #11

What's the 'no_compression' attribute means? #11

Comments

Kangfei commented Apr 25, 2021

ghost commented Apr 28, 2021

Kangfei commented Apr 30, 2021 • edited Loading

ghost commented May 6, 2021

Kangfei commented Apr 30, 2021 •

edited

Loading