This is the generic solution for backup and restore. Depending on the backup strategy used, the tools might change.
- The Ansible script will automatically configure pg_basebackup, pgbackrest, wal-g and other recovery tools. For the sake of simplicity, we can use pg_dump and pg_restore.
- The following command takes a backup. This will create a compressed tarball backup in the directory mentioned:
{% hint style="info" %} pg_dump -h 192.168.0.100 -U postgres -F c remote_db1 > remote_db1.tar {% endhint %}
- This can be scheduled using a cron as shown below:
{% hint style="info" %} 0 0 * * * <path to backup script> {% endhint %}
- Pg_basebackup is installed along with psql client
{% hint style="info" %} sudo apt install postgresql-client {% endhint %}
- You can restore from pg_dump as follows:
{% hint style="info" %} pg_restore -h 192.168.0.100 -U postgres -F -C -d db1 < db1.tar {% endhint %}
The plan is to use clickhouse-backup, which is open sourced under the liberal MIT license. This tool has the ability to create archived backups and upload them to NFS, S3, GCS, AZBlob, SFTP and other remote data repositories.
- Download the latest release from https://github.com/AlexAkulov/clickhouse-backup/releases
- Untar the archive
{% hint style="info" %} tar -zxvf clickhouse-backup.tar.gz {% endhint %}
- Create a configuration file as follows, call it config.ini
{% hint style="info" %}
general:
remote_storage: none # REMOTE_STORAGE, if `none` then `upload` and `download` command will fail
max_file_size: 1073741824 # MAX_FILE_SIZE, 1G by default, useless when upload_by_part is true, use for split data parts files by archives
disable_progress_bar: true # DISABLE_PROGRESS_BAR, show progress bar during upload and download, have sense only when `upload_concurrency` and `download_concurrency` equal 1
backups_to_keep_local: 0 # BACKUPS_TO_KEEP_LOCAL, how much newest local backup should keep, 0 mean all created backups will keep on local disk
# you shall to run `clickhouse-backup delete local <backup_name>` command to avoid useless disk space allocations
backups_to_keep_remote: 0 # BACKUPS_TO_KEEP_REMOTE, how much newest backup should keep on remote storage, 0 mean all uploaded backups will keep on remote storage.
# if old backup is required for newer incremental backup, then it will don't delete. Be careful with long incremental backup sequences.
log_level: info # LOG_LEVEL
allow_empty_backups: false # ALLOW_EMPTY_BACKUPS
download_concurrency: 1 # DOWNLOAD_CONCURRENCY, max 255
upload_concurrency: 1 # UPLOAD_CONCURRENCY, max 255
restore_schema_on_cluster: "" # RESTORE_SCHEMA_ON_CLUSTER, execute all schema related SQL queryes with `ON CLUSTER` clause as Distributed DDL, look to `system.clusters` table for proper cluster name
upload_by_part: true # UPLOAD_BY_PART
download_by_part: true # DOWNLOAD_BY_PART
clickhouse:
username: default # CLICKHOUSE_USERNAME
password: "" # CLICKHOUSE_PASSWORD
host: localhost # CLICKHOUSE_HOST
port: 9000 # CLICKHOUSE_PORT, don't use 8123, clickhouse-backup doesn't support HTTP protocol
disk_mapping: {} # CLICKHOUSE_DISK_MAPPING, use it if your system.disks on restored servers not the same with system.disks on server where backup was created
skip_tables: # CLICKHOUSE_SKIP_TABLES
- system.*
- INFORMATION_SCHEMA.*
- information_schema.*
timeout: 5m # CLICKHOUSE_TIMEOUT
freeze_by_part: false # CLICKHOUSE_FREEZE_BY_PART
secure: false # CLICKHOUSE_SECURE, use SSL encryption for
connect
skip_verify: false # CLICKHOUSE_SKIP_VERIFY
sync_replicated_tables: true # CLICKHOUSE_SYNC_REPLICATED_TABLES
log_sql_queries: true # CLICKHOUSE_LOG_SQL_QUERIES, enable log clickhouse-backup SQL queries on `system.query_log` table inside clickhouse-server
debug: false # CLICKHOUSE_DEBUG
config_dir: "/etc/clickhouse-server" # CLICKHOUSE_CONFIG_DIR
restart_command: "systemctl restart clickhouse-server" # CLICKHOUSE_RESTART_COMMAND, this command use when you try to restore with --rbac or --config options
ignore_not_exists_error_during_freeze: true # CLICKHOUSE_IGNORE_NOT_EXISTS_ERROR_DURING_FREEZE, allow avoiding backup failures when you often CREATE / DROP tables and databases during backup creation, clickhouse-backup will ignore `code: 60` and `code: 81` errors during execute `ALTER TABLE ... FREEZE`
azblob:
endpoint_suffix: "core.windows.net" # AZBLOB_ENDPOINT_SUFFIX
account_name: "" # AZBLOB_ACCOUNT_NAME
account_key: "" # AZBLOB_ACCOUNT_KEY
sas: "" # AZBLOB_SAS
use_managed_identity: false # AZBLOB_USE_MANAGED_IDENTITY
container: "" # AZBLOB_CONTAINER
path: "" # AZBLOB_PATH
compression_level: 1 # AZBLOB_COMPRESSION_LEVEL
compression_format: tar # AZBLOB_COMPRESSION_FORMAT
sse_key: "" # AZBLOB_SSE_KEY
buffer_size: 0 # AZBLOB_BUFFER_SIZE, if less or eq 0 then calculated as max_file_size / 10000, between 2Mb and 4Mb
max_buffers: 3 # AZBLOB_MAX_BUFFERS
s3:
access_key: "" # S3_ACCESS_KEY
secret_key: "" # S3_SECRET_KEY
bucket: "" # S3_BUCKET
endpoint: "" # S3_ENDPOINT
region: us-east-1 # S3_REGION
acl: private # S3_ACL
assume_role_arn: "" # S3_ASSUME_ROLE_ARN
force_path_style: false # S3_FORCE_PATH_STYLE
path: "" # S3_PATH
disable_ssl: false # S3_DISABLE_SSL
compression_level: 1 # S3_COMPRESSION_LEVEL
compression_format: tar # S3_COMPRESSION_FORMAT
sse: "" # S3_SSE, empty (default), AES256, or aws:kms
disable_cert_verification: false . # S3_DISABLE_CERT_VERIFICATION
storage_class: STANDARD # S3_STORAGE_CLASS
concurrency: 1 # S3_CONCURRENCY
part_size: 0 # S3_PART_SIZE, if less or eq 0 then calculated as max_file_size / 10000
debug: false # S3_DEBUG
gcs:
credentials_file: "" # GCS_CREDENTIALS_FILE
credentials_json: "" # GCS_CREDENTIALS_JSON
bucket: "" # GCS_BUCKET
path: "" # GCS_PATH
compression_level: 1 # GCS_COMPRESSION_LEVEL
compression_format: tar # GCS_COMPRESSION_FORMAT
debug: false # GCS_DEBUG
cos:
url: "" # COS_URL
timeout: 2m # COS_TIMEOUT
secret_id: "" # COS_SECRET_ID
secret_key: "" # COS_SECRET_KEY
path: "" # COS_PATH
compression_format: tar # COS_COMPRESSION_FORMAT
compression_level: 1 # COS_COMPRESSION_LEVEL
ftp:
address: "" # FTP_ADDRESS
timeout: 2m # FTP_TIMEOUT
username: "" # FTP_USERNAME
password: "" # FTP_PASSWORD
tls: false # FTP_TLS
path: "" # FTP_PATH
compression_format: tar # FTP_COMPRESSION_FORMAT
compression_level: 1 # FTP_COMPRESSION_LEVEL
debug: false # FTP_DEBUG
sftp:
address: "" # SFTP_ADDRESS
username: "" # SFTP_USERNAME
password: "" # SFTP_PASSWORD
key: "" # SFTP_KEY
path: "" # SFTP_PATH
concurrency: 1 # SFTP_CONCURRENCY
compression_format: tar # SFTP_COMPRESSION_FORMAT
compression_level: 1 # SFTP_COMPRESSION_LEVEL
debug: false # SFTP_DEBUG
api:
listen: "localhost:7171" # API_LISTEN
enable_metrics: true # API_ENABLE_METRICS
enable_pprof: false # API_ENABLE_PPROF
username: "" # API_USERNAME, basic authorization for API endpoint
password: "" # API_PASSWORD
secure: false # API_SECURE, use TLS for listen API socket
certificate_file: "" # API_CERTIFICATE_FILE
private_key_file: "" # API_PRIVATE_KEY_FILE
create_integration_tables: false # API_CREATE_INTEGRATION_TABLES
allow_parallel: false # API_ALLOW_PARALLEL, could allocate much memory and spawn go-routines, don't enable it if you not sure
{% endhint %}
- Ensure configuration under clickhouse and general section of the configuration file. The rest are not mandatory.
- If automated remote upload functionality is needed, the appropriate section needs to be filled in: sftp, ftp, s3, GCS, AZBlob, etc.
- The following command can be run:
{% hint style="info" %} sh <path-to-cllickhouse-backup-dir>/bin/clickhouse-backup create -C <path to config.ini> {% endhint %}
- The following is the list of possible commands which can be executed:
{% hint style="info" %}
COMMANDS:
tables Print list of tables
create Create new backup
create_remote Create and upload
upload Upload backup to remote storage
list Print list of backups
download Download backup from remote storage
restore Create schema and restore data from backup
restore_remote Download and restore
delete Delete specific backup
default-config Print default config
print-config Print current config
clean Remove data in 'shadow' folder from all `path` folders available from `system.disks`
server Run API server
help, h Shows a list of commands or help for one command
{% endhint %}
- Backup zookeeper state data
- Go to file
{% hint style="info" %} kafka/config/zookeeper.properties {% endhint %}
- Copy location of dataDir property (typically, /tmp/zookeeper)
- Run the following command:
{% hint style="info" %} tar -czf /home/kafka/zookeeper-backup.tar.gz /tmp/zookeeper/* {% endhint %}
- Backup Kafka topics and messages
- Go to the file kafka/config/server.properties
- Copy location of log.dirs (typically, /tmp/kafka-logs)
- Stop Kafka:
{% hint style="info" %} sudo systemctl stop kafka {% endhint %}
- Login as kafka user:
{% hint style="info" %} sudo -iu kafka {% endhint %}
- Run the following command:
{% hint style="info" %} tar -czf /home/kafka/kafka-backup.tar.gz /tmp/kafka-logs/* {% endhint %}
- Restore zookeeper
- sudo systemctl stop kafka
- sudo systemctl stop zookeeper
- sudo -iu kafka
- rm -r /tmp/zookeeper/*
- tar -C /tmp/zookeeper -xzf /home/kafka/zookeeper-backup.tar.gz
--strip-components 2
- Restore kafka
- rm -r /tmp/kafka-logs/*
- tar -C /tmp/kafka-logs -xzf /home/kafka/kafka-backup.tar.gz --strip-components 2
- sudo systemctl start kafka
- sudo systemctl start zookeeper
- Verification of Restoration
- ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
BackupTopic --from-beginning
Redis provides an in-built command to save a backup.
- Install redis-cli using
{% hint style="info" %} sudo apt install redis-cli {% endhint %}
- The following command takes a backup of the redis-server:
{% hint style="info" %} echo save | redis-cli -u redis://<user>:<pass>@<host>:<port> >> /tmp/redis-backup.log {% endhint %}
- This will save the backup as dump.rdb within:
{% hint style="info" %} /var/lib/redis {% endhint %}
Restoration can be done in the following way:
- Locate the redis data directory, typically:
{% hint style="info" %} /var/lib/redis {% endhint %}
- Move the dump.rdb file into this folder
- Start redis server
- This will ensure that data is restored automatically
__
All content on this page by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.