Skip to content

Commit d5dc9b3

Browse files
committed
Add compression and visualization with Grafana.
1 parent d3e01d0 commit d5dc9b3

16 files changed

+544
-39
lines changed

β€ŽREADME.md

Lines changed: 167 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
11
# Multi-Node-TimescaleDB
22

3-
Demo project for an online workshop with #RuPostgresTuesday.
3+
Demo project for online workshop with #RuPostgresTuesday.
4+
Watch tons of cool and useful videos on their channel:
5+
[https://youtube.com/RuPostgres](https://youtube.com/RuPostgres).
6+
47
Check out the first part:
58
[Π’-s02e08 Распаковка TimescaleDB 2.0. Π’ гостях β€” Иван ΠœΡƒΡ€Π°Ρ‚ΠΎΠ²](https://www.youtube.com/watch?v=vbJCq9PhSR0&t=5395s&ab_channel=%23RuPostgres).
69

7-
If you need the same project as in first part check out the branch:
10+
If you need the same project as in the first part check out the branch:
811
[PgTuesday_1_17.11.2020](https://github.com/binakot/Multi-Node-TimescaleDB/tree/PgTuesday_1_17.11.2020).
912

1013
The second one is coming...
@@ -16,20 +19,23 @@ The main branch is under development and can be different from the video.
1619
A multi-node setup of TimescaleDB 2.0.0 RC3.
1720

1821
Initial cluster configuration:
19-
single access node (AN) and 2 data nodes (DN) with 1 week interval and replication factor 1.
22+
single access node (AN) and 2 data nodes (DN)
23+
with 1 week interval and replication factor 1.
2024

2125
## How to run
2226

27+
Docker is required!
28+
29+
Create external network and run application stack.
30+
2331
```bash
24-
# Run app stack with external network
2532
$ docker network create pg_cluster_network
2633
$ docker-compose up -d
2734
```
2835

29-
`PgAdmin` is available on [http://localhost:15432](http://localhost:15432)
30-
with `[email protected]` / `admin`.
31-
32-
Just add new connections in GUI with settings:
36+
`PgAdmin` is available on [http://localhost:15432](http://localhost:15432) with credentials: `[email protected]` / `admin`.
37+
`PgAdmin` can render `PostGIS` data right on the map.
38+
Or you can use any tool you like (`psql`, `franchise`, etc) if you don't want to look at geographical beauty ;)
3339

3440
```text
3541
# Access node
@@ -53,13 +59,13 @@ password: postgres
5359

5460
## Workshop
5561

56-
### 1. Preparation
62+
### 1. Initialization
5763

5864
At this moment you should to have a running cluster with 1 access node and 2 data nodes.
5965
If you didn't please look at `how to run` section and do it firstly.
6066
Also, you need access to all nodes via `psql`, `pgAdmin` or any other way you like.
6167

62-
Now you can fill sample data:
68+
Now you can fill sample data (took about 2 minutes on NVMe):
6369

6470
```bash
6571
$ gzip -k -d ./data/*csv.gz
@@ -68,17 +74,18 @@ $ docker exec -i pg_access_node /bin/sh < ./load-init-data.sh
6874

6975
### 2. Learning cluster configuration
7076

71-
Run on access node and each data nodes separately.
77+
Run on access node and each data nodes separately:
7278

7379
```sql
74-
SELECT DISTINCT imei FROM telemetries ORDER BY imei;
7580
SELECT count(*) FROM telemetries;
81+
SELECT * FROM approximate_row_count('telemetries');
82+
SELECT DISTINCT imei FROM telemetries ORDER BY imei;
7683
```
7784

7885
### 3. Querying to cluster via access node
7986

8087
```sql
81-
-- Speed analytics for 1 year
88+
-- Total speed analytics for 1 year
8289
SELECT
8390
time_bucket('30 days', time) AS bucket,
8491
imei,
@@ -151,42 +158,153 @@ SELECT * FROM timescaledb_information.data_nodes;
151158
Then attach new data node to the distributed hypertable:
152159

153160
```sql
154-
SELECT * FROM attach_data_node('data_node_3', 'telemetries');
155161
SELECT * FROM timescaledb_information.hypertables;
162+
SELECT * FROM timescaledb_information.dimensions;
163+
164+
SELECT * FROM attach_data_node('data_node_3', 'telemetries');
165+
SELECT * FROM timescaledb_information.dimensions;
156166
```
157167

158168
### 5. Add more sample data into the cluster with 3 data nodes
159169

170+
Fill more sample data (took about 1 minutes on NVMe):
171+
160172
```bash
161173
$ docker exec -i pg_access_node /bin/sh < ./load-more-data.sh
162174
```
163175

164-
Run on access node and each data nodes separately.
176+
Run on access node and each data nodes separately:
165177

166178
```sql
167-
SELECT DISTINCT imei FROM telemetries ORDER BY imei;
168179
SELECT count(*) FROM telemetries;
180+
SELECT * FROM approximate_row_count('telemetries');
181+
SELECT DISTINCT imei FROM telemetries ORDER BY imei;
182+
```
183+
184+
Check old and new data distribution:
185+
186+
```sql
187+
SELECT data_nodes, chunk_name, range_start, range_end FROM timescaledb_information.chunks
188+
WHERE range_start < '2020-01-01'
189+
ORDER BY data_nodes ASC, range_start ASC;
190+
SELECT data_nodes FROM timescaledb_information.chunks
191+
WHERE range_start < '2020-01-01'
192+
GROUP BY data_nodes;
193+
194+
SELECT data_nodes, chunk_name, range_start, range_end FROM timescaledb_information.chunks
195+
WHERE range_start > '2020-01-01'
196+
ORDER BY data_nodes ASC, range_start ASC;
197+
SELECT data_nodes FROM timescaledb_information.chunks
198+
WHERE range_start > '2020-01-01'
199+
GROUP BY data_nodes;
200+
```
201+
202+
### 6. Compression
203+
204+
Check current database size and compression status:
205+
206+
```sql
207+
-- Compression settings on each data node
208+
SELECT * FROM timescaledb_information.compression_settings;
209+
210+
-- Hypertable sizes
211+
SELECT * FROM hypertable_detailed_size('telemetries');
212+
SELECT node_name, pg_size_pretty(total_bytes) AS total
213+
FROM hypertable_detailed_size('telemetries')
214+
ORDER BY node_name ASC;
215+
216+
-- Chunk sizes
217+
SELECT * FROM chunks_detailed_size('telemetries');
218+
SELECT node_name, chunk_name, pg_size_pretty(total_bytes) AS total
219+
FROM chunks_detailed_size('telemetries')
220+
ORDER BY node_name ASC, chunk_name ASC;
221+
```
222+
223+
Apply compression to hypertable:
224+
225+
```sql
226+
ALTER TABLE telemetries SET (
227+
timescaledb.compress,
228+
timescaledb.compress_orderby = 'time DESC',
229+
timescaledb.compress_segmentby = 'imei'
230+
);
231+
232+
SELECT compress_chunk(i) FROM show_chunks('telemetries', older_than => INTERVAL '30 days') i;
233+
234+
CALL distributed_exec('SELECT add_compression_policy(''telemetries'', INTERVAL ''30 days'', if_not_exists => TRUE)');
235+
```
236+
237+
Check database size after applying compression:
238+
239+
```sql
240+
-- Compression settings on each data node
241+
SELECT * FROM timescaledb_information.compression_settings;
242+
243+
-- Hypertable compression
244+
SELECT * FROM hypertable_compression_stats('telemetries');
245+
SELECT node_name, pg_size_pretty(before_compression_total_bytes) AS before, pg_size_pretty(after_compression_total_bytes) AS after
246+
FROM hypertable_compression_stats('telemetries')
247+
ORDER BY node_name ASC;
248+
249+
-- Chunk compression
250+
SELECT * FROM chunk_compression_stats('telemetries');
251+
SELECT node_name, chunk_name, pg_size_pretty(before_compression_total_bytes) AS before, pg_size_pretty(after_compression_total_bytes) AS after
252+
FROM chunk_compression_stats('telemetries')
253+
ORDER BY node_name ASC, chunk_name ASC;
169254
```
170255

171-
### !!! TODO MORE STEPS !!!
256+
Check that data is still available:
257+
258+
```sql
259+
-- Single track for 1 month
260+
SELECT imei, ST_MakeLine(telemetries.geography::geometry ORDER BY time)::geography AS track
261+
FROM telemetries
262+
WHERE imei = '000000000000001'
263+
AND time > '2019-09-01' AND time < '2019-10-01'
264+
GROUP BY imei;
265+
```
266+
267+
### 7. Visualization
268+
269+
Run `Grafana` in docker container:
270+
271+
```bash
272+
$ docker run \
273+
--name=grafana \
274+
-p 3000:3000 \
275+
-e "GF_INSTALL_PLUGINS=grafana-worldmap-panel" \
276+
-d grafana/grafana
277+
```
172278

173-
- Correct data distribution between nodes
279+
Open it on [http://localhost:3000](http://localhost:3000)
280+
with `admin / admin`.
174281

175-
- Block one data node and fill more data
282+
Then add `TimescaleDB` as new datasource and import dashboard:
176283

177-
- Chunk compression
284+
* Configuration / Data Sources / Add data source / Find and select `PostgreSQL`.
178285

179-
- Add Grafana
286+
* Connect to access node via docker bridge (host=`172.17.0.1`; port=`5432`; db=`postgres`; user=`postgres`; password=`postgres`; ssl=`off`).
180287

181-
### N. Stop the cluster
288+
* Select `PostgreSQL` version `12` and enable `TimescaleDB` support.
289+
290+
* Import dashboard from the file `grafana.json` (Create / Import / Upload JSON file).
291+
292+
### N. Play with cluster and stop it after
182293

183294
```bash
295+
# grafana
296+
$ docker stop grafana
297+
$ docker rm grafana
298+
299+
# 3th data node
184300
$ docker stop pg_data_node_3
185301
$ docker rm pg_data_node_3
186302
$ docker volume rm pg_data_node_3_data
187303

304+
# cluster
188305
$ docker-compose down --volumes
189306

307+
# network
190308
$ docker network rm pg_cluster_network
191309
```
192310

@@ -196,6 +314,8 @@ $ docker network rm pg_cluster_network
196314

197315
* [TimescaleDB Blog: TimescaleDB 2.0](https://blog.timescale.com/blog/timescaledb-2-0-a-multi-node-petabyte-scale-completely-free-relational-database-for-time-series)
198316

317+
* [TimescaleDB Docs: Changes in TimescaleDB 2.0](https://docs.timescale.com/v2.0/release-notes/changes-in-timescaledb-2)
318+
199319
* [TimescaleDB Docs: Single Node vs. Multi-Node](https://docs.timescale.com/v2.0/introduction/architecture#single-node-vs-clustering)
200320

201321
* [TimescaleDB Docs: Set up multi-node TimescaleDB](https://docs.timescale.com/v2.0/getting-started/setup-multi-node-basic)
@@ -217,25 +337,42 @@ $ docker network rm pg_cluster_network
217337
## Main points
218338

219339
* Distributed hypertables and multi-node capabilities are currently in `BETA`.
220-
This feature is not meant for production use.
340+
This feature is not meant for production use!
221341

222-
* Distributed hypertable `limitations`: https://docs.timescale.com/v2.0/using-timescaledb/limitations.
223-
224-
* To ensure best performance, you should partition a distributed hypertable by both `time and space`.
225-
226-
* A distributed hypertable exists in a `distributed database` that consists of multiple databases stored across one or more TimescaleDB instances.
227-
A database that is part of a distributed database can assume the role of either an `access node` or a `data node` (but not both).
228-
While the data nodes store distributed chunks, the access node is the entry point for clients to access distributed hypertables.
342+
* Distributed hypertable `limitations`:
343+
[https://docs.timescale.com/v2.0/using-timescaledb/limitations](https://docs.timescale.com/v2.0/using-timescaledb/limitations).
229344

230345
* TimescaleDB supports `distributing hypertables` across multiple nodes (i.e., a cluster).
231346
A multi-node TimescaleDB implementation consists of:
232347
one access node to handle ingest, data routing and act as an entry point for user access;
233348
one or more data nodes to store and organize distributed data.
234349

350+
* A distributed hypertable exists in a `distributed database` that consists of multiple databases stored across one or more TimescaleDB instances.
351+
A database that is part of a distributed database can assume the role of either an `access node` or a `data node` (but not both).
352+
While the data nodes store distributed chunks, the access node is the entry point for clients to access distributed hypertables.
353+
235354
* A client connects to an `access node` database.
236355
You should not directly access hypertables or chunks on data nodes.
237356
Doing so might lead to inconsistent distributed hypertables.
238357

239358
* TimescaleDB can be elastically scaled out by simply `adding data nodes` to a distributed database.
240359
TimescaleDB can (and will) adjust the number of space partitions as new data nodes are added.
241360
Although existing chunks will not have their space partitions updated, the new settings will be applied to newly created chunks.
361+
362+
* To ensure best performance, you should partition a distributed hypertable by both `time and space`.
363+
If you only partition data by time, that chunk will have to fill up before the access node chooses another data node to store the next chunk.
364+
Chunks would then be created on data nodes in `round-robin` fashion.
365+
In case of multiple space partitions, `only the first space partition` will be used to determine how chunks are distributed across servers (hash partitioning).
366+
Multi-dimensional partitioning with an additional "space" dimension that consistently partitions the data over the data nodes, similar to traditional `sharding`.
367+
368+
* A distributed hypertable can be configured to write each chunk to multiple data nodes in order to replicate data at the chunk level.
369+
This `native replication` ensures that a distributed hypertable is protected against data node failures
370+
and provides an alternative to fully replicating each data node using streaming replication.
371+
When querying a distributed hypertable using native replication, the `query planner` knows how to include only one replica of each chunk in the query plan.
372+
The planner can employ different strategies to pick the set of chunk replicas in order to, e.g., evenly spread the query load across the data nodes.
373+
Native replication is currently `under development` and lacks functionality for a complete high-availability solution.
374+
It's recommended keeping the replication factor set at the default value of 1, and instead use streaming replication on each data node.
375+
376+
* The current version does not support altering or inserting data into `compressed` chunks. The data can be queried without any modifications,
377+
however if you need to `backfill` or update data in a compressed chunk you will need to `decompress` the chunk(s) first.
378+
TimescaleDB also `block modifying` the schema of hypertables with compressed chunks.

β€Ždata/data1.csv.gz

17.7 KB
Binary file not shown.

β€Ždata/data10.csv.gz

18.4 KB
Binary file not shown.

β€Ždata/data2.csv.gz

1001 Bytes
Binary file not shown.

β€Ždata/data3.csv.gz

3.52 KB
Binary file not shown.

β€Ždata/data4.csv.gz

19.3 KB
Binary file not shown.

β€Ždata/data5.csv.gz

6.12 KB
Binary file not shown.

β€Ždata/data6.csv.gz

-1.67 KB
Binary file not shown.

β€Ždata/data7.csv.gz

-320 Bytes
Binary file not shown.

β€Ždata/data8.csv.gz

15.7 KB
Binary file not shown.

β€Ždata/data9.csv.gz

37.1 KB
Binary file not shown.

0 commit comments

Comments
Β (0)