ACE: An Attention-based Model for Cardinality Estimation of Set-Valued Queries

Introduction

This repo contains the source code of ACE

Quick Start

The link of three datasets with generated query workloads can be downloaded via this link. Running the code mainly includes 2 steps: (1) Featurization represents the underlying data numerically and outputs the distilled matrix. (2) Estimation utilizes the data matrix and the queried element embeddings to predicate the cardinality.

For the e2e experiment, please refer to this repo.

Before running

You need to change the folder path. You can search TODO to see all places.

Featurization (offline)

To train a model to represent data, you could run:

python3 data_encoder.py --d [dataset] --r [distill ratio] --dis_dep [distill depth]

This command will split the dataset into training and testing data. Then the trained aggregation and distillation models are stored in the folder ./save_model.

Then, you could run the following command to generate the distilled dataset representation.

python3 data_representation.py --d [dataset] --r [distill ratio] --dis_sep [distill depth]

The generated representation is also stored in the folder ./save_model.

Estimation (online)

After generating the representation, you could train the estimator by running:

python3 query_analyzer.py --d [dataset] --qt [query type]

Additionally, you could use the trained model to get the estimation by running:

python3 query_analyzer.py --d [dataset] --m test --qt [query type] --qf [frequency]

Here, the query workloads are divided into three classes based on their type: superset, subset and overlap. Additionally, each query workload are also partitioned into three sub-classes based on the frequency of its comprised elements: regular(considering all elements), high(only considering high-frequency element) and low(only considering high-frequency element).

Dynamic Data

The whole process is similar to the static one.

You first need to train the encoder by using the data_encoder.py file. Then, you can use the dynamic/dy_data_representation.py file to generate the representation of each update epoch. Finally, you need to run dynamic/dy_query_analyzer.py to see the update time (train) and the Q-error (test).

Ablation Study

Except the CA ablation study, it is easy to modify the original code. For example, for the AG ablation study, you only need to exclude the feed-forward network. Additionally, we have commented (SA ablation) the parts that you need to use in the SA ablation study.

In terms of the CA ablation study, you need to change two files: query_analyzer.py and model/ace.py. In these files, you need to uncomment the part with CA ablation comment and comment the original ACE part.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
model		model
query		query
utils		utils
LICENSE		LICENSE
README.md		README.md
aa.py		aa.py
clean_data.py		clean_data.py
conn_database.py		conn_database.py
data_encoder.py		data_encoder.py
data_representation.py		data_representation.py
dy_data_representation.py		dy_data_representation.py
dy_data_tnse.py		dy_data_tnse.py
dy_dataset_query_cartinality_true.py		dy_dataset_query_cartinality_true.py
dy_query_analyzer.py		dy_query_analyzer.py
dynamic_split.py		dynamic_split.py
find.py		find.py
food_freq.npy		food_freq.npy
freq.py		freq.py
gn_freq.npy		gn_freq.npy
hy_exerperiment.py		hy_exerperiment.py
main.py		main.py
new_sets.csv		new_sets.csv
new_sets_with_new_elements.csv		new_sets_with_new_elements.csv
old_dy_data_representation.py		old_dy_data_representation.py
old_dy_query_analyzer.py		old_dy_query_analyzer.py
overlap		overlap
overlap_queries.txt		overlap_queries.txt
proj_no_cl.pt		proj_no_cl.pt
proj_with_cl.pt		proj_with_cl.pt
query_analyzer.py		query_analyzer.py
subset_queries.csv		subset_queries.csv
superset_queries.txt		superset_queries.txt
test_query_spilt.py		test_query_spilt.py
test_results.txt		test_results.txt
train_nocloss.py		train_nocloss.py
tsne.py		tsne.py
tsne_nocloss.py		tsne_nocloss.py
tweet_freq.npy		tweet_freq.npy
wiki_freq.npy		wiki_freq.npy
wiki_freq_new.npy		wiki_freq_new.npy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ACE: An Attention-based Model for Cardinality Estimation of Set-Valued Queries

Introduction

Quick Start

Before running

Featurization (offline)

Estimation (online)

Dynamic Data

Ablation Study

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ACE: An Attention-based Model for Cardinality Estimation of Set-Valued Queries

Introduction

Quick Start

Before running

Featurization (offline)

Estimation (online)

Dynamic Data

Ablation Study

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages