Skip to content

Commit 1137e22

Browse files
committed
add benchmark
1 parent ab6fdc5 commit 1137e22

File tree

199 files changed

+15937
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

199 files changed

+15937
-0
lines changed

README.md

+11
Original file line numberDiff line numberDiff line change
@@ -1 +1,12 @@
11
Graph Database Benchmark : Neo4j vs Amazon Neptune vs Titan vs TigerGraph vs JanusGraph vs Arangodb
2+
3+
- same datasets
4+
- same query workload
5+
- same enviroment (hardware)
6+
- cross validation of result
7+
- each vendor's benchmark is under
8+
/benchmark/vendor_name/
9+
- start with README under each folder
10+
- all test can be reproducible on EC2 or similar enviroment.
11+
12+

benchmark/arangodb/ArangoTask.class

2.01 KB
Binary file not shown.

benchmark/arangodb/README

+181
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
############################################################
2+
# Copyright (c) 2015-now, TigerGraph Inc.
3+
# All rights reserved
4+
# It is provided as it is for benchmark reproducible purpose.
5+
# anyone can use it for benchmark purpose with the
6+
# acknowledgement to TigerGraph.
7+
# Author: Mingxi Wu [email protected]
8+
############################################################
9+
10+
This article documents the details on how to reproduce the graph database benchmark result on ArangoDB.
11+
12+
Data Sets
13+
===========
14+
15+
- graph500 edge file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22
16+
- graph500 vertex file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22_unique_node
17+
18+
- twitter edge file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.tar.gz
19+
- twitter vertex file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.net_unique_node
20+
21+
22+
Hardware & Major enviroment
23+
================================
24+
- Amazon EC2 machine r4.8xlarge.
25+
- OS Ubuntu 14.04.5 LTS
26+
- Java build 1.7.0_181
27+
- Python 2.7.6
28+
29+
30+
- 32vCPUs
31+
- 244GiB memory
32+
- attached a 300GiB EBS-optimized Provisioned IOPS SSD (IO1), we set IOPS to 15k.
33+
Raw data and arangodb datafiles are put on this SSD.
34+
35+
ArangoDB Version
36+
==================
37+
- 3.3.13 Community edition downloaded from https://download.arangodb.com/arangodb33/xUbuntu_14.04
38+
- Java driver arangodb-java-driver-4.7.0-SNAPSHOT-standalone.jar (downloaded from https://github.com/arangodb/arangodb-java-driver)
39+
- SLF4J loggig library slf4j-simple-1.7.25.jar (downloaded from https://www.slf4j.org/download.html)
40+
41+
Install ArangoDB
42+
==================
43+
# add repository key
44+
wget https://www.arangodb.com/repositories/arangodb33/xUbuntu_14.04/Release.key
45+
sudo apt-key add Release.key
46+
47+
# add apt repository
48+
sudo apt-add-repository 'deb https://www.arangodb.com/repositories/arangodb33/xUbuntu_14.04/ /'
49+
sudo apt-get update
50+
51+
# install ArangoDB
52+
# set password="root" for the root user
53+
# storage engine choose mmfiles or rocksdb (benchmark results provided for both storage options)
54+
sudo apt-get install arangodb3=3.3.13
55+
56+
# check if installation went well
57+
curl http://root:root@localhost:8529/_api/version
58+
59+
# output should look like this
60+
{"server":"arango","version":"3.3.13","license":"community"}
61+
62+
Setup ebs volume as database directory for ArangoDB
63+
=====================================================
64+
# switch to root
65+
sudo bash
66+
67+
# create new database directory
68+
mkdir /ebs/arangodb
69+
chmod 777 -R /ebs
70+
71+
# create symbolic link to this directory from ArangoDB default database directory /var/lib/arangodb3
72+
cd /var/lib
73+
mv arangodb3 /ebs/arangodb
74+
ln -s /ebs/arangodb/arangodb3/ arangodb3
75+
76+
# exit root
77+
exit
78+
79+
# restart arangodb
80+
sudo service arangodb3 restart
81+
82+
Run benchmark
83+
================
84+
Download all files in the README folder to a script folder.
85+
86+
Before running the benchmark script below, please make sure the current user has the READ permission from the raw file,
87+
and the WRITE permission on the folder where you put the benchmark script folder, since the random seed will be generted in this folder.
88+
89+
You may consider to make ssh config to keep it alive by following, since the benchmark will run long time.
90+
https://www.howtogeek.com/howto/linux/keep-your-linux-ssh-session-from-disconnecting/
91+
92+
Load Data
93+
-----------------
94+
Download dataset in the same folder where all the benchmark files are.
95+
96+
# download graph500 dataset
97+
wget http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22
98+
wget http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22_unique_node
99+
100+
101+
# download twitter dataset
102+
wget http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.tar.gz
103+
wget http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.net_unique_node
104+
tar -xzf twitter_rv.tar.gz
105+
106+
ArangoDB requires input files to have headers.
107+
108+
# Add headers to graph500 dataset files
109+
sed -i '1i _key' graph500-22_unique_node
110+
sed -i '1i _from\t_to' graph500-22
111+
112+
# Add headers to twitter dataset files
113+
sed -i '1i _key' twitter_rv.net_unique_node
114+
sed -i '1i _from\t_to' twitter_rv.net
115+
116+
RUN load scripts.
117+
118+
# to load graph500 data
119+
bash load_graph500.sh
120+
121+
# to load twitter data
122+
bash load_twitter.sh
123+
124+
CHECK storage size.
125+
126+
# mmfiles engine
127+
# run this arangosh command to find out the id of a database to use it later (replace database_name)
128+
arangosh --server.database "database_name" --server.password "root" --javascript.execute-string "print(db._id())"
129+
130+
# and now run these commands under root user (use database id retrieved earlier instead of database_id_number)
131+
cd /ebs/arangodb/arangodb3/databases
132+
du -hc database-database_id_number
133+
134+
# rocksdb engine
135+
# run under root user (datafiles from all databases stored under the same folder)
136+
cd /ebs/arangodb/arangodb3
137+
du -hc engine-rocksdb
138+
139+
Graph500
140+
-----------------
141+
# khop (output file has format khopResults_graph500_k)
142+
# compile (if necessary)
143+
javac -cp \* khop.java
144+
145+
# to run all khop queries on graph500 (k=1,2 average over 300 seeds, k=3,6 average over 10 seeds)
146+
bash run_khop.sh graph500
147+
148+
# OR alternatively to run one query only use the following command with arguments
149+
java -cp .:\* khop graph_name depth timeout_seconds
150+
151+
# examples
152+
java -cp .:\* khop graph500 1 180
153+
java -cp .:\* khop graph500 3 9000
154+
155+
# wcc
156+
bash run_pg_wcc.sh graph500 wcc
157+
158+
# pagerank
159+
bash run_pg_wcc.sh graph500 pagerank
160+
161+
Twitter
162+
-------------
163+
# khop (output file has format khopResults_twitter_k)
164+
# compile (if necessary)
165+
javac -cp \* khop.java
166+
167+
# to run all khop queries on twitter (k=1,2 average over 300 seeds, k=3,6 average over 10 seeds)
168+
bash run_khop.sh twitter
169+
170+
# OR alternatively to run one query only use the following command with arguments
171+
java -cp .:\* khop graph_name depth timeout_seconds
172+
173+
# examples
174+
java -cp .:\* khop twitter 1 180
175+
java -cp .:\* khop twitter 3 9000
176+
177+
# wcc
178+
bash run_pg_wcc.sh twitter wcc
179+
180+
# pagerank
181+
bash run_pg_wcc.sh twitter pagerank
Binary file not shown.

benchmark/arangodb/graph500-22-seed

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3600312 2677094 2038005 3301167 704219 1779962 2681401 2277366 1649130 806220 3783689 3979771 2878950 1316789 4099483 2654216 3520283 320529 460890 2861567 1676721 3582851 2025534 1897682 3042164 683461 484783 2964318 825304 2303395 3029190 2119218 341236 3921645 3350720 1382338 2497566 2293317 1365818 3108349 1039487 656628 326459 3486463 1513849 3120768 3254104 2859677 4100533 1214662 2844418 3228461 2971789 838862 3242202 231946 103480 745855 2202837 121973 2944986 3916778 1237877 2404335 3903782 3753107 2638320 3532534 3026267 149529 2522099 1565761 1345848 1059426 2994540 1629629 1481421 337894 2706001 342515 2301230 3455722 4103891 2560844 316796 3853684 2803721 2782143 4168065 1297201 2982970 1089600 3589606 1978189 514482 773765 1929789 2499474 1367644 3052548 2020748 1934532 2595851 1265635 2678981 3484689 2778764 323958 1972929 2529296 2638682 2836761 3489646 2304697 3006908 3976118 432800 3408347 3184190 2478197 3990575 3097880 259436 479595 2054949 1014166 2398658 3499821 289302 2689848 603652 2764479 3458769 2372488 3826201 610619 1502380 1417031 1291296 1699680 1816799 2952048 3747093 996609 1906969 712790 1973404 2874441 4072076 534367 2419131 3145715 1172458 2547240 579284 3952328 3217974 928922 2975442 3686619 143324 2262470 2844253 3960743 95176 2661831 289798 498881 459455 3778765 2575099 2321106 898887 1630163 3268706 25081 3747551 2048028 1377545 2178454 3666746 1692598 1809240 1461949 3878592 96570 4095479 2539031 364055 3514283 3843398 3556803 2592596 168 2336570 327991 2445956 1140337 2663510 2514997 1933620 1076164 3734798 99836 2404509 3102298 2158818 3088473 3861233 1453810 1952126 968226 594138 1059034 408333 3246311 587844 1602562 2546319 2861944 1360827 1915610 957424 1427107 433135 3353932 140407 1989222 1392471 1290284 2144691 1299024 764990 302910 4192735 3181076 1535127 263980 1571976 2271738 492328 3976408 1621372 3024237 3229179 2167063 102878 4085765 2370758 2987431 2633916 1177859 1581601 18147 697579 3491436 699069 1608362 2570730 3929663 1304943 3733946 2216412 3013035 261001 32290 113329 1509856 2190260 3103760 3687843 1245035 3341532 857395 3942814 3982809 2807038 3291942 1840809 760204 3108890 1416278 3725922 2189358 2810970 655805 63077 3708992 2622204 1647516 1274701 2238470 83658 3800740 3659055 740181 318596 1353213 3058396 3497001

benchmark/arangodb/khop.class

4.12 KB
Binary file not shown.

benchmark/arangodb/khop.java

+145
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
import java.util.Iterator;
2+
import java.util.Map;
3+
import java.io.BufferedReader;
4+
import java.io.FileWriter;
5+
import java.io.File;
6+
import java.io.FileReader;
7+
8+
import java.util.concurrent.Callable;
9+
import java.util.concurrent.ExecutorService;
10+
import java.util.concurrent.Executors;
11+
import java.util.concurrent.Future;
12+
import java.util.concurrent.TimeUnit;
13+
import java.util.concurrent.TimeoutException;
14+
15+
import com.arangodb.ArangoCollection;
16+
import com.arangodb.ArangoDBException;
17+
import com.arangodb.ArangoDatabase;
18+
import com.arangodb.ArangoCursor;
19+
import com.arangodb.ArangoDB;
20+
import com.arangodb.ArangoDBException;
21+
import com.arangodb.entity.BaseDocument;
22+
import com.arangodb.entity.CollectionEntity;
23+
import com.arangodb.util.MapBuilder;
24+
import com.arangodb.model.AqlQueryOptions;
25+
26+
class ArangoTask implements Callable<long[]> {
27+
private ArangoDB arangoDB = null;
28+
private ArangoDatabase db = null;
29+
private int depth = 0;
30+
private String root = "";
31+
public ArangoTask(String dbName, int depth, String root){
32+
this.arangoDB = new ArangoDB.Builder().user("root").password("root").build();
33+
this.db = arangoDB.db(dbName);
34+
this.depth = depth;
35+
this.root = root;
36+
}
37+
@Override
38+
public long[] call() throws Exception {
39+
40+
String query = "FOR v IN "+depth+".."+depth+" OUTBOUND 'vertex/"+root+"' edge RETURN distinct v._id";
41+
long startTime = System.nanoTime();
42+
ArangoCursor<String> cursor = db.query(query, null, new AqlQueryOptions().count(true), String.class);
43+
long endTime = System.nanoTime();
44+
long diff = (endTime - startTime)/1000000;
45+
arangoDB.shutdown();
46+
47+
return new long[]{cursor.count(), diff};
48+
}
49+
}
50+
51+
public class khop {
52+
53+
54+
public static void main(String[] args) {
55+
56+
if (args.length == 0 || args.length < 3){
57+
System.out.println("Provide graph name (graph500, twitter), depth (1,2,3,6) AND timeout in seconds");
58+
System.exit(0);
59+
}
60+
61+
String dbName = args[0];
62+
int depth = Integer.parseInt(args[1]);
63+
int timeout = Integer.parseInt(args[2]);
64+
String seedFile = "graph500-22-seed";
65+
if (dbName.equals("twitter")){
66+
seedFile = "twitter_rv.net-seed";
67+
}
68+
try {
69+
// reed seeds
70+
File file = new File(seedFile);
71+
FileReader fileReader = new FileReader(file);
72+
BufferedReader bufferedReader = new BufferedReader(fileReader);
73+
StringBuffer stringBuffer = new StringBuffer();
74+
String line = bufferedReader.readLine();
75+
String[] roots = line.split(" ");
76+
77+
//file to write results
78+
FileWriter writer = new FileWriter("khopResults_" + dbName + "_" + depth);
79+
writer.write("k-hop query with depth = " + depth + "\n");
80+
writer.write("start vertex,\tneighbor size,\tquery time (in ms)\n");
81+
82+
long totalSize = 0;
83+
double totalTime = 0.0;
84+
int errorQuery = 0;
85+
int totalQuery = 0;
86+
long[] result = new long[0];;
87+
for(String root:roots) {
88+
89+
// for depth 3 qnd 6 only need to run 10 queries
90+
if (depth > 2 && totalQuery == 10){
91+
break;
92+
}
93+
94+
totalQuery++;
95+
ArangoTask arangoTask = new ArangoTask(dbName, depth, root);
96+
ExecutorService executor = Executors.newSingleThreadExecutor();
97+
Future<long[]> future = executor.submit(arangoTask);
98+
try {
99+
System.out.println("Starting...seed=" + root);
100+
result = future.get(timeout, TimeUnit.SECONDS);
101+
System.out.println("Finished!");
102+
} catch (TimeoutException e) {
103+
future.cancel(true);
104+
result = new long[]{-1, -1};
105+
errorQuery++;
106+
System.out.println("TIMEOUT: query terminated!");
107+
} catch (Exception e){
108+
System.out.println("Failed to terminate:" + e.getMessage());
109+
}
110+
111+
executor.shutdownNow();
112+
if (result[0] != -1){
113+
totalSize += result[0];
114+
totalTime += result[1];
115+
}
116+
117+
writer.write(root + ",\t" + Long.toString(result[0]) + ",\t" + Long.toString(result[1]) + "\n");
118+
writer.flush();
119+
120+
}
121+
double avgSize = totalQuery == errorQuery ? -1.0 : (double)totalSize/(double)(totalQuery-errorQuery);
122+
double avgTime = totalQuery == errorQuery ? -1.0 : totalTime/(double)(totalQuery-errorQuery);
123+
System.out.println("===================SUMMARY=================================\n");
124+
System.out.println("Total "+ depth + "-Neighborhood size: " + totalSize);
125+
System.out.println("Total elapsed time, ms: " + totalTime);
126+
System.out.println("Total number of queries: " + totalQuery);
127+
System.out.println("Number of failed queries: " + errorQuery);
128+
System.out.println("Average " + depth + "-Neighborhood size: " + avgSize);
129+
System.out.println("Average query time, ms: " + avgTime);
130+
131+
writer.write("===================SUMMARY=================================\n");
132+
writer.write("Total number of queries:\t" + totalQuery + "\n" + "Total elapsed time, ms:\t" + totalTime + "\n" + "Total Neighborhood size:\t" + totalSize + "\n" + "Total number of failed queries:\t" + errorQuery + "\n" + "Average Neighborhood size:\t" + avgSize + "\n" + "Average query time, ms:\t" + avgTime + "\n");
133+
writer.flush();
134+
writer.close();
135+
} catch (Exception e) {
136+
e.printStackTrace();
137+
}
138+
System.out.println("Done!");
139+
System.exit(0);
140+
141+
}
142+
143+
144+
}
145+

benchmark/arangodb/load_graph500.sh

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
echo "Load dataset: graph500"
2+
3+
echo "Create database: graph500"
4+
arangosh --server.username "root" --server.password "root" --javascript.execute-string "print(db._createDatabase('graph500'))"
5+
6+
# Use --threads 16 for rocksdb storage engine
7+
echo "Load vertex collection ..."
8+
time arangoimp --file graph500-22_unique_node --collection vertex --create-collection true --type tsv --server.password "root" --server.database "graph500" --threads 16
9+
10+
# Use --threads 16 for rocksdb storage engine
11+
echo "Load edge collection ..."
12+
time arangoimp --file graph500-22 --collection edge --create-collection true --type tsv --create-collection-type edge --from-collection-prefix vertex --to-collection-prefix vertex --server.password "root" --server.database "graph500" --threads 16
13+
14+
echo "Load complete!"

benchmark/arangodb/load_twitter.sh

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
echo "Load dataset: twitter"
2+
3+
echo "Create database: twitter"
4+
arangosh --server.username "root" --server.password "root" --javascript.execute-string "print(db._createDatabase('twitter'))"
5+
6+
# Use --threads 16 for rocksdb storage engine
7+
echo "Load vertex collection ..."
8+
time arangoimp --file twitter_rv.net_unique_node --collection vertex --create-collection true --type tsv --server.password "root" --server.database "twitter" --threads 16
9+
10+
echo "Load edge collection ..."
11+
# Use --threads 16 for rocksdb storage engine and timeouts set in seconds
12+
time arangoimp --file twitter_rv.net --collection edge --create-collection true --type tsv --create-collection-type edge --from-collection-prefix vertex --to-collection-prefix vertex --server.password "root" --server.database "twitter" --server.connection-timeout 86400 --server.request-timeout 86400 --threads 16
13+
14+
echo "Load complete!"

0 commit comments

Comments
 (0)