Skip to content
This repository was archived by the owner on Sep 23, 2024. It is now read-only.

Commit 8c4732e

Browse files
authored
Document usage (#79)
1 parent 2b67be1 commit 8c4732e

File tree

1 file changed

+158
-3
lines changed

1 file changed

+158
-3
lines changed

README.md

Lines changed: 158 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,164 @@ or
3737
pip install .
3838
```
3939

40-
### Configuration
40+
### Create a config.json
4141

42+
```
43+
{
44+
"host": "localhost",
45+
"port": 5432,
46+
"user": "postgres",
47+
"password": "secret",
48+
"dbname": "db"
49+
}
50+
```
51+
52+
These are the same basic configuration properties used by the PostgreSQL command-line client (`psql`).
53+
54+
Full list of options in `config.json`:
55+
56+
| Property | Type | Required? | Description |
57+
|-------------------------------------|---------|------------|---------------------------------------------------------------|
58+
| host | String | Yes | PostgreSQL host |
59+
| port | Integer | Yes | PostgreSQL port |
60+
| user | String | Yes | PostgreSQL user |
61+
| password | String | Yes | PostgreSQL password |
62+
| dbname | String | Yes | PostgreSQL database name |
63+
| filter_schemas | String | No | Comma separated schema names to scan only the required schemas to improve the performance of data extraction. (Default: None) |
64+
| ssl | String | No | If set to `"true"` then use SSL via postgres sslmode `require` option. If the server does not accept SSL connections or the client certificate is not recognized the connection will fail. (Default: None) |
65+
| logical_poll_seconds | Integer | No | Stop running the tap when no data received from wal after certain number of seconds. (Default: 10800) |
66+
| break_at_end_lsn | Boolean | No | Stop running the tap if the newly received lsn is after the max lsn that was detected when the tap started. (Default: true) |
67+
| max_run_seconds | Integer | No | Stop running the tap after certain number of seconds. (Default: 43200) |
68+
| debug_lsn | String | No | If set to `"true"` then add `_sdc_lsn` property to the singer messages to debug postgres LSN position in the WAL stream. (Default: None) |
69+
70+
71+
### Run the tap in Discovery Mode
72+
73+
```
74+
tap-postgres --config config.json --discover # Should dump a Catalog to stdout
75+
tap-postgres --config config.json --discover > catalog.json # Capture the Catalog
76+
```
77+
78+
### Add Metadata to the Catalog
79+
80+
Each entry under the Catalog's "stream" key will need the following metadata:
81+
82+
```
83+
{
84+
"streams": [
85+
{
86+
"stream_name": "my_topic"
87+
"metadata": [{
88+
"breadcrumb": [],
89+
"metadata": {
90+
"selected": true,
91+
"replication-method": "LOG_BASED",
92+
}
93+
}]
94+
}
95+
]
96+
}
97+
```
98+
99+
The replication method can be one of `FULL_TABLE`, `INCREMENTAL` or `LOG_BASED`.
100+
101+
**Note**: Log based replication requires a few adjustments in the source postgres database, please read further
102+
for more information.
103+
104+
### Run the tap in Sync Mode
105+
106+
```
107+
tap-postgres --config config.json --properties catalog.json
108+
```
109+
110+
The tap will write bookmarks to stdout which can be captured and passed as an optional `--state state.json` parameter
111+
to the tap for the next sync.
112+
113+
### Log Based replication requirements
114+
115+
* PostgreSQL databases running **PostgreSQL versions 9.4.x or greater**. To avoid a critical PostgreSQL bug,
116+
use at least one of the following minor versions:
117+
- PostgreSQL 12.0
118+
- PostgreSQL 11.2
119+
- PostgreSQL 10.7
120+
- PostgreSQL 9.6.12
121+
- PostgreSQL 9.5.16
122+
- PostgreSQL 9.4.21
123+
124+
* **A connection to the master instance**. Log-based replication will only work by connecting to the master instance.
125+
126+
* **wal2json plugin**: To use Log Based for your PostgreSQL integration, you must install the wal2json plugin.
127+
The wal2json plugin outputs JSON objects for logical decoding, which the tap then uses to perform Log-based Replication.
128+
Steps for installing the plugin vary depending on your operating system. Instructions for each operating system type
129+
are in the wal2json’s GitHub repository:
130+
131+
* [Unix-based operating systems](https://github.com/eulerto/wal2json#unix-based-operating-systems)
132+
* [Windows](https://github.com/eulerto/wal2json#windows)
42133

43-
---
44134

45-
Based on Stitch documentation
135+
* **postgres config file**: Locate the database configuration file (usually `postgresql.conf`) and define
136+
the parameters as follows:
137+
138+
```
139+
wal_level=logical
140+
max_replication_slots=5
141+
max_wal_senders=5
142+
```
143+
144+
Restart your PostgreSQL service to ensure the changes take effect.
145+
146+
**Note**: For `max_replication_slots` and `max_wal_senders`, we’re defaulting to a value of 5.
147+
This should be sufficient unless you have a large number of read replicas connected to the master instance.
148+
149+
150+
* **Existing replication slot**: Log based replication requires a dedicated logical replication slot.
151+
In PostgreSQL, a logical replication slot represents a stream of database changes that can then be replayed to a
152+
client in the order they were made on the original server. Each slot streams a sequence of changes from a single
153+
database.
154+
155+
Login to the master instance as a superuser and using the `wal2json` plugin, create a logical replication slot:
156+
```
157+
SELECT *
158+
FROM pg_create_logical_replication_slot('pipelinewise_<database_name>', 'wal2json');
159+
```
160+
161+
**Note**: Replication slots are specific to a given database in a cluster. If you want to connect multiple
162+
databases - whether in one integration or several - you’ll need to create a replication slot for each database.
163+
164+
### To run tests:
165+
166+
1. Install python test dependencies in a virtual env:
167+
```
168+
python3 -m venv venv
169+
. venv/bin/activate
170+
pip install --upgrade pip
171+
pip install .[test]
172+
```
173+
174+
2. You need to have a postgres database to run the tests and export its credentials:
175+
```
176+
export TAP_POSTGRES_HOST=<postgres-host>
177+
export TAP_POSTGRES_PORT=<postgres-port>
178+
export TAP_POSTGRES_USER=<postgres-user>
179+
export TAP_POSTGRES_PASSWORD=<postgres-password>
180+
```
181+
182+
Test objects will be created in the `postgres` database.
183+
184+
3. To run the tests:
185+
```
186+
make test
187+
```
188+
189+
### To run pylint:
190+
191+
1. Install python dependencies and run python linter
192+
```
193+
python3 -m venv venv
194+
. venv/bin/activate
195+
pip install --upgrade pip
196+
pip install .[test]
197+
make pylint
198+
```
199+
200+
---

0 commit comments

Comments
 (0)