binakot
diff --git a/‎README.md
Lines changed: 167 additions & 30 deletions b/‎README.md
Lines changed: 167 additions & 30 deletions
diff --git a/‎data/data1.csv.gz
17.7 KB b/‎data/data1.csv.gz
17.7 KB
diff --git a/‎data/data10.csv.gz
18.4 KB b/‎data/data10.csv.gz
18.4 KB
diff --git a/‎data/data2.csv.gz
1001 Bytes b/‎data/data2.csv.gz
1001 Bytes
diff --git a/‎data/data3.csv.gz
3.52 KB b/‎data/data3.csv.gz
3.52 KB
diff --git a/‎data/data4.csv.gz
19.3 KB b/‎data/data4.csv.gz
19.3 KB
diff --git a/‎data/data5.csv.gz
6.12 KB b/‎data/data5.csv.gz
6.12 KB
diff --git a/‎data/data6.csv.gz
-1.67 KB b/‎data/data6.csv.gz
-1.67 KB
diff --git a/‎data/data7.csv.gz
-320 Bytes b/‎data/data7.csv.gz
-320 Bytes
diff --git a/‎data/data8.csv.gz
15.7 KB b/‎data/data8.csv.gz
15.7 KB
diff --git a/‎data/data9.csv.gz
37.1 KB b/‎data/data9.csv.gz
37.1 KB
@@ -1,10 +1,13 @@
 # Multi-Node-TimescaleDB
 
-Demo project for an online workshop with #RuPostgresTuesday.
+Demo project for online workshop with #RuPostgresTuesday.
+Watch tons of cool and useful videos on their channel: 
+[https://youtube.com/RuPostgres](https://youtube.com/RuPostgres).
+
 Check out the first part: 
 [В-s02e08 Распаковка TimescaleDB 2.0. В гостях — Иван Муратов](https://www.youtube.com/watch?v=vbJCq9PhSR0&t=5395s&ab_channel=%23RuPostgres).
 
-If you need the same project as in first part check out the branch: 
+If you need the same project as in the first part check out the branch: 
 [PgTuesday_1_17.11.2020](https://github.com/binakot/Multi-Node-TimescaleDB/tree/PgTuesday_1_17.11.2020).
 
 The second one is coming...
@@ -16,20 +19,23 @@ The main branch is under development and can be different from the video.
 A multi-node setup of TimescaleDB 2.0.0 RC3.
 
 Initial cluster configuration: 
-single access node (AN) and 2 data nodes (DN) with 1 week interval and replication factor 1.
+single access node (AN) and 2 data nodes (DN) 
+with 1 week interval and replication factor 1.
 
 ## How to run
 
+Docker is required!
+
+Create external network and run application stack.
+
 ```bash
-# Run app stack with external network
 $ docker network create pg_cluster_network
 $ docker-compose up -d
 ```
 
-`PgAdmin` is available on [http://localhost:15432](http://localhost:15432) 
-with `[email protected]` / `admin`.
-
-Just add new connections in GUI with settings: 
+`PgAdmin` is available on [http://localhost:15432](http://localhost:15432) with credentials: `[email protected]` / `admin`. 
+`PgAdmin` can render `PostGIS` data right on the map.
+Or you can use any tool you like (`psql`, `franchise`, etc) if you don't want to look at geographical beauty ;)
 
 ```text
 # Access node
@@ -53,13 +59,13 @@ password: postgres
 
 ## Workshop
 
-### 1. Preparation
+### 1. Initialization
 
 At this moment you should to have a running cluster with 1 access node and 2 data nodes.
 If you didn't please look at `how to run` section and do it firstly.
 Also, you need access to all nodes via `psql`, `pgAdmin` or any other way you like.
 
-Now you can fill sample data:
+Now you can fill sample data (took about 2 minutes on NVMe):
 
 ```bash
 $ gzip -k -d ./data/*csv.gz
@@ -68,17 +74,18 @@ $ docker exec -i pg_access_node /bin/sh < ./load-init-data.sh
 
 ### 2. Learning cluster configuration
 
-Run on access node and each data nodes separately.
+Run on access node and each data nodes separately:
 
 ```sql
-SELECT DISTINCT imei FROM telemetries ORDER BY imei;
 SELECT count(*) FROM telemetries;
+SELECT * FROM approximate_row_count('telemetries');
+SELECT DISTINCT imei FROM telemetries ORDER BY imei;
 ```
 
 ### 3. Querying to cluster via access node
 
 ```sql
--- Speed analytics for 1 year
+-- Total speed analytics for 1 year
 SELECT
     time_bucket('30 days', time) AS bucket,
     imei,
@@ -151,42 +158,153 @@ SELECT * FROM timescaledb_information.data_nodes;
 Then attach new data node to the distributed hypertable:
 
 ```sql
-SELECT * FROM attach_data_node('data_node_3', 'telemetries');
 SELECT * FROM timescaledb_information.hypertables;
+SELECT * FROM timescaledb_information.dimensions;
+
+SELECT * FROM attach_data_node('data_node_3', 'telemetries');
+SELECT * FROM timescaledb_information.dimensions;
 ```
 
 ### 5. Add more sample data into the cluster with 3 data nodes
 
+Fill more sample data (took about 1 minutes on NVMe):
+
 ```bash
 $ docker exec -i pg_access_node /bin/sh < ./load-more-data.sh
 ```
 
-Run on access node and each data nodes separately.
+Run on access node and each data nodes separately:
 
 ```sql
-SELECT DISTINCT imei FROM telemetries ORDER BY imei;
 SELECT count(*) FROM telemetries;
+SELECT * FROM approximate_row_count('telemetries');
+SELECT DISTINCT imei FROM telemetries ORDER BY imei;
+```
+
+Check old and new data distribution:
+
+```sql
+SELECT data_nodes, chunk_name, range_start, range_end FROM timescaledb_information.chunks
+WHERE range_start < '2020-01-01'
+ORDER BY data_nodes ASC, range_start ASC;
+SELECT data_nodes FROM timescaledb_information.chunks
+WHERE range_start < '2020-01-01'
+GROUP BY data_nodes;
+
+SELECT data_nodes, chunk_name, range_start, range_end FROM timescaledb_information.chunks
+WHERE range_start > '2020-01-01'
+ORDER BY data_nodes ASC, range_start ASC;
+SELECT data_nodes FROM timescaledb_information.chunks
+WHERE range_start > '2020-01-01'
+GROUP BY data_nodes;
+```
+
+### 6. Compression
+
+Check current database size and compression status:
+
+```sql
+-- Compression settings on each data node
+SELECT * FROM timescaledb_information.compression_settings;
+
+-- Hypertable sizes
+SELECT * FROM hypertable_detailed_size('telemetries');
+SELECT node_name, pg_size_pretty(total_bytes) AS total 
+FROM hypertable_detailed_size('telemetries')
+ORDER BY node_name ASC;
+
+-- Chunk sizes
+SELECT * FROM chunks_detailed_size('telemetries');
+SELECT node_name, chunk_name, pg_size_pretty(total_bytes) AS total
+FROM chunks_detailed_size('telemetries')
+ORDER BY node_name ASC, chunk_name ASC;
+```
+
+Apply compression to hypertable:
+
+```sql
+ALTER TABLE telemetries SET (
+    timescaledb.compress, 
+    timescaledb.compress_orderby = 'time DESC', 
+    timescaledb.compress_segmentby = 'imei'
+);
+
+SELECT compress_chunk(i) FROM show_chunks('telemetries', older_than => INTERVAL '30 days') i;
+
+CALL distributed_exec('SELECT add_compression_policy(''telemetries'', INTERVAL ''30 days'', if_not_exists => TRUE)');
+```
+
+Check database size after applying compression:
+
+```sql
+-- Compression settings on each data node
+SELECT * FROM timescaledb_information.compression_settings;
+
+-- Hypertable compression
+SELECT * FROM hypertable_compression_stats('telemetries');
+SELECT node_name, pg_size_pretty(before_compression_total_bytes) AS before, pg_size_pretty(after_compression_total_bytes) AS after 
+FROM hypertable_compression_stats('telemetries')
+ORDER BY node_name ASC;
+
+-- Chunk compression
+SELECT * FROM chunk_compression_stats('telemetries');
+SELECT node_name, chunk_name, pg_size_pretty(before_compression_total_bytes) AS before, pg_size_pretty(after_compression_total_bytes) AS after 
+FROM chunk_compression_stats('telemetries')
+ORDER BY node_name ASC, chunk_name ASC;
 ```
 
-### !!! TODO MORE STEPS !!!
+Check that data is still available:
+
+```sql
+-- Single track for 1 month
+SELECT imei, ST_MakeLine(telemetries.geography::geometry ORDER BY time)::geography AS track
+FROM telemetries
+WHERE imei = '000000000000001'
+AND time > '2019-09-01' AND time < '2019-10-01'
+GROUP BY imei;
+```
+
+### 7. Visualization
+
+Run `Grafana` in docker container:
+
+```bash
+$ docker run \
+    --name=grafana \
+    -p 3000:3000 \
+    -e "GF_INSTALL_PLUGINS=grafana-worldmap-panel" \
+    -d grafana/grafana
+```
 
-- Correct data distribution between nodes
+Open it on [http://localhost:3000](http://localhost:3000)
+with `admin / admin`.
 
-- Block one data node and fill more data
+Then add `TimescaleDB` as new datasource and import dashboard:
 
-- Chunk compression
+* Configuration / Data Sources / Add data source / Find and select `PostgreSQL`.
 
-- Add Grafana
+* Connect to access node via docker bridge (host=`172.17.0.1`; port=`5432`; db=`postgres`; user=`postgres`; password=`postgres`; ssl=`off`).
 
-### N. Stop the cluster
+* Select `PostgreSQL` version `12` and enable `TimescaleDB` support.
+
+* Import dashboard from the file `grafana.json` (Create / Import / Upload JSON file).
+
+### N. Play with cluster and stop it after
 
 ```bash
+# grafana
+$ docker stop grafana
+$ docker rm grafana
+
+# 3th data node
 $ docker stop pg_data_node_3
 $ docker rm pg_data_node_3
 $ docker volume rm pg_data_node_3_data
 
+# cluster
 $ docker-compose down --volumes
 
+# network
 $ docker network rm pg_cluster_network
 ```
 
@@ -196,6 +314,8 @@ $ docker network rm pg_cluster_network
 
 * [TimescaleDB Blog: TimescaleDB 2.0](https://blog.timescale.com/blog/timescaledb-2-0-a-multi-node-petabyte-scale-completely-free-relational-database-for-time-series)
 
+* [TimescaleDB Docs: Changes in TimescaleDB 2.0](https://docs.timescale.com/v2.0/release-notes/changes-in-timescaledb-2)
+
 * [TimescaleDB Docs: Single Node vs. Multi-Node](https://docs.timescale.com/v2.0/introduction/architecture#single-node-vs-clustering)
 
 * [TimescaleDB Docs: Set up multi-node TimescaleDB](https://docs.timescale.com/v2.0/getting-started/setup-multi-node-basic)
@@ -217,25 +337,42 @@ $ docker network rm pg_cluster_network
 ## Main points
 
 * Distributed hypertables and multi-node capabilities are currently in `BETA`. 
-This feature is not meant for production use.
+This feature is not meant for production use!
 
-* Distributed hypertable `limitations`: https://docs.timescale.com/v2.0/using-timescaledb/limitations.
-
-* To ensure best performance, you should partition a distributed hypertable by both `time and space`.
-
-* A distributed hypertable exists in a `distributed database` that consists of multiple databases stored across one or more TimescaleDB instances. 
-A database that is part of a distributed database can assume the role of either an `access node` or a `data node` (but not both).
-While the data nodes store distributed chunks, the access node is the entry point for clients to access distributed hypertables.
+* Distributed hypertable `limitations`: 
+[https://docs.timescale.com/v2.0/using-timescaledb/limitations](https://docs.timescale.com/v2.0/using-timescaledb/limitations).
 
 * TimescaleDB supports `distributing hypertables` across multiple nodes (i.e., a cluster).
 A multi-node TimescaleDB implementation consists of:
 one access node to handle ingest, data routing and act as an entry point for user access;
 one or more data nodes to store and organize distributed data.
 
+* A distributed hypertable exists in a `distributed database` that consists of multiple databases stored across one or more TimescaleDB instances. 
+A database that is part of a distributed database can assume the role of either an `access node` or a `data node` (but not both).
+While the data nodes store distributed chunks, the access node is the entry point for clients to access distributed hypertables.
+
 * A client connects to an `access node` database. 
 You should not directly access hypertables or chunks on data nodes. 
 Doing so might lead to inconsistent distributed hypertables.
 
 * TimescaleDB can be elastically scaled out by simply `adding data nodes` to a distributed database.
 TimescaleDB can (and will) adjust the number of space partitions as new data nodes are added.
 Although existing chunks will not have their space partitions updated, the new settings will be applied to newly created chunks.
+
+* To ensure best performance, you should partition a distributed hypertable by both `time and space`. 
+If you only partition data by time, that chunk will have to fill up before the access node chooses another data node to store the next chunk. 
+Chunks would then be created on data nodes in `round-robin` fashion.
+In case of multiple space partitions, `only the first space partition` will be used to determine how chunks are distributed across servers (hash partitioning).
+Multi-dimensional partitioning with an additional "space" dimension that consistently partitions the data over the data nodes, similar to traditional `sharding`.
+
+* A distributed hypertable can be configured to write each chunk to multiple data nodes in order to replicate data at the chunk level. 
+This `native replication` ensures that a distributed hypertable is protected against data node failures 
+and provides an alternative to fully replicating each data node using streaming replication.
+When querying a distributed hypertable using native replication, the `query planner` knows how to include only one replica of each chunk in the query plan. 
+The planner can employ different strategies to pick the set of chunk replicas in order to, e.g., evenly spread the query load across the data nodes.
+Native replication is currently `under development` and lacks functionality for a complete high-availability solution.
+It's recommended keeping the replication factor set at the default value of 1, and instead use streaming replication on each data node.
+
+* The current version does not support altering or inserting data into `compressed` chunks. The data can be queried without any modifications, 
+however if you need to `backfill` or update data in a compressed chunk you will need to `decompress` the chunk(s) first.
+TimescaleDB also `block modifying` the schema of hypertables with compressed chunks.