Skip to content

What's the 'no_compression' attribute means? #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Kangfei opened this issue Apr 25, 2021 · 3 comments
Closed

What's the 'no_compression' attribute means? #11

Kangfei opened this issue Apr 25, 2021 · 3 comments

Comments

@Kangfei
Copy link

Kangfei commented Apr 25, 2021

Hi,

I want to inject new datasets and what's the no_compression attribute means? Why we need to set all attributes to no_compression by default?

Best,
Kangfei

@ghost
Copy link

ghost commented Apr 28, 2021

DeepDB by default applies compression techniques in the leaf nodes to reduce the storage overhead. Usually, this does not hurt the accuracy for numerical columns. However, if there are numerical columns with only a few distinct values or if you want to apply equality predicateson numerical columns, this compression can result in a performance regression. Hence, in these cases you should include this column into the no_compression columns.

@Kangfei
Copy link
Author

Kangfei commented Apr 30, 2021

Thanks for your reply.
Another question is about getting the confident interval. I noticed that it needs an input ground truth file, like the 'confidence_interval_10M.pkl' in the below command. What's the content in this file? Whether there is a 'ground truth confidence interval' in it?

python3 maqp.py --evaluate_confidence_intervals
--dataset flights1B
--target_path ./baselines/aqp/results/deepDB/flights1B_confidence_intervals.csv
--ensemble_location ../mqp-data/flights-benchmark/spn_ensembles/ensemble_single_flights1B_10000000.pkl
--query_file_location ./benchmarks/flights/sql/aqp_queries.sql
--ground_truth_file_location ./benchmarks/flights/confidence_intervals/confidence_interval_10M.pkl
--confidence_upsampling_factor 100
--confidence_sample_size 10000000

Best,
Kangfei

@ghost
Copy link

ghost commented May 6, 2021

This code evaluates the quality of the confidence intervals predicted by DeepDB. For an evaluation, you of course need the ground truth to compare the predicted intervals. In this case, the ground truth are simply the computed confidence intervals given a sample using standard statistical methods. The content of the files can also be found in this repository. We have also included the code that generates these files (cf. Command below "Optional: Create the ground truth for confidence interval. " in the repository).

@ghost ghost closed this as completed Jun 8, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant