|
37 | 37 | pip install .
|
38 | 38 | ```
|
39 | 39 |
|
40 |
| -### Configuration |
| 40 | +### Create a config.json |
41 | 41 |
|
| 42 | +``` |
| 43 | +{ |
| 44 | + "host": "localhost", |
| 45 | + "port": 5432, |
| 46 | + "user": "postgres", |
| 47 | + "password": "secret", |
| 48 | + "dbname": "db" |
| 49 | +} |
| 50 | +``` |
| 51 | + |
| 52 | +These are the same basic configuration properties used by the PostgreSQL command-line client (`psql`). |
| 53 | + |
| 54 | +Full list of options in `config.json`: |
| 55 | + |
| 56 | +| Property | Type | Required? | Description | |
| 57 | +|-------------------------------------|---------|------------|---------------------------------------------------------------| |
| 58 | +| host | String | Yes | PostgreSQL host | |
| 59 | +| port | Integer | Yes | PostgreSQL port | |
| 60 | +| user | String | Yes | PostgreSQL user | |
| 61 | +| password | String | Yes | PostgreSQL password | |
| 62 | +| dbname | String | Yes | PostgreSQL database name | |
| 63 | +| filter_schemas | String | No | Comma separated schema names to scan only the required schemas to improve the performance of data extraction. (Default: None) | |
| 64 | +| ssl | String | No | If set to `"true"` then use SSL via postgres sslmode `require` option. If the server does not accept SSL connections or the client certificate is not recognized the connection will fail. (Default: None) | |
| 65 | +| logical_poll_seconds | Integer | No | Stop running the tap when no data received from wal after certain number of seconds. (Default: 10800) | |
| 66 | +| break_at_end_lsn | Boolean | No | Stop running the tap if the newly received lsn is after the max lsn that was detected when the tap started. (Default: true) | |
| 67 | +| max_run_seconds | Integer | No | Stop running the tap after certain number of seconds. (Default: 43200) | |
| 68 | +| debug_lsn | String | No | If set to `"true"` then add `_sdc_lsn` property to the singer messages to debug postgres LSN position in the WAL stream. (Default: None) | |
| 69 | + |
| 70 | + |
| 71 | +### Run the tap in Discovery Mode |
| 72 | + |
| 73 | +``` |
| 74 | +tap-postgres --config config.json --discover # Should dump a Catalog to stdout |
| 75 | +tap-postgres --config config.json --discover > catalog.json # Capture the Catalog |
| 76 | +``` |
| 77 | + |
| 78 | +### Add Metadata to the Catalog |
| 79 | + |
| 80 | +Each entry under the Catalog's "stream" key will need the following metadata: |
| 81 | + |
| 82 | +``` |
| 83 | +{ |
| 84 | + "streams": [ |
| 85 | + { |
| 86 | + "stream_name": "my_topic" |
| 87 | + "metadata": [{ |
| 88 | + "breadcrumb": [], |
| 89 | + "metadata": { |
| 90 | + "selected": true, |
| 91 | + "replication-method": "LOG_BASED", |
| 92 | + } |
| 93 | + }] |
| 94 | + } |
| 95 | + ] |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +The replication method can be one of `FULL_TABLE`, `INCREMENTAL` or `LOG_BASED`. |
| 100 | + |
| 101 | +**Note**: Log based replication requires a few adjustments in the source postgres database, please read further |
| 102 | +for more information. |
| 103 | + |
| 104 | +### Run the tap in Sync Mode |
| 105 | + |
| 106 | +``` |
| 107 | +tap-postgres --config config.json --properties catalog.json |
| 108 | +``` |
| 109 | + |
| 110 | +The tap will write bookmarks to stdout which can be captured and passed as an optional `--state state.json` parameter |
| 111 | +to the tap for the next sync. |
| 112 | + |
| 113 | +### Log Based replication requirements |
| 114 | + |
| 115 | +* PostgreSQL databases running **PostgreSQL versions 9.4.x or greater**. To avoid a critical PostgreSQL bug, |
| 116 | + use at least one of the following minor versions: |
| 117 | + - PostgreSQL 12.0 |
| 118 | + - PostgreSQL 11.2 |
| 119 | + - PostgreSQL 10.7 |
| 120 | + - PostgreSQL 9.6.12 |
| 121 | + - PostgreSQL 9.5.16 |
| 122 | + - PostgreSQL 9.4.21 |
| 123 | + |
| 124 | +* **A connection to the master instance**. Log-based replication will only work by connecting to the master instance. |
| 125 | + |
| 126 | +* **wal2json plugin**: To use Log Based for your PostgreSQL integration, you must install the wal2json plugin. |
| 127 | + The wal2json plugin outputs JSON objects for logical decoding, which the tap then uses to perform Log-based Replication. |
| 128 | + Steps for installing the plugin vary depending on your operating system. Instructions for each operating system type |
| 129 | + are in the wal2json’s GitHub repository: |
| 130 | + |
| 131 | + * [Unix-based operating systems](https://github.com/eulerto/wal2json#unix-based-operating-systems) |
| 132 | + * [Windows](https://github.com/eulerto/wal2json#windows) |
42 | 133 |
|
43 |
| ---- |
44 | 134 |
|
45 |
| -Based on Stitch documentation |
| 135 | +* **postgres config file**: Locate the database configuration file (usually `postgresql.conf`) and define |
| 136 | + the parameters as follows: |
| 137 | + |
| 138 | + ``` |
| 139 | + wal_level=logical |
| 140 | + max_replication_slots=5 |
| 141 | + max_wal_senders=5 |
| 142 | + ``` |
| 143 | +
|
| 144 | + Restart your PostgreSQL service to ensure the changes take effect. |
| 145 | + |
| 146 | + **Note**: For `max_replication_slots` and `max_wal_senders`, we’re defaulting to a value of 5. |
| 147 | + This should be sufficient unless you have a large number of read replicas connected to the master instance. |
| 148 | +
|
| 149 | +
|
| 150 | +* **Existing replication slot**: Log based replication requires a dedicated logical replication slot. |
| 151 | + In PostgreSQL, a logical replication slot represents a stream of database changes that can then be replayed to a |
| 152 | + client in the order they were made on the original server. Each slot streams a sequence of changes from a single |
| 153 | + database. |
| 154 | + |
| 155 | + Login to the master instance as a superuser and using the `wal2json` plugin, create a logical replication slot: |
| 156 | + ``` |
| 157 | + SELECT * |
| 158 | + FROM pg_create_logical_replication_slot('pipelinewise_<database_name>', 'wal2json'); |
| 159 | + ``` |
| 160 | +
|
| 161 | + **Note**: Replication slots are specific to a given database in a cluster. If you want to connect multiple |
| 162 | + databases - whether in one integration or several - you’ll need to create a replication slot for each database. |
| 163 | +
|
| 164 | +### To run tests: |
| 165 | +
|
| 166 | +1. Install python test dependencies in a virtual env: |
| 167 | +``` |
| 168 | + python3 -m venv venv |
| 169 | + . venv/bin/activate |
| 170 | + pip install --upgrade pip |
| 171 | + pip install .[test] |
| 172 | +``` |
| 173 | +
|
| 174 | +2. You need to have a postgres database to run the tests and export its credentials: |
| 175 | +``` |
| 176 | + export TAP_POSTGRES_HOST=<postgres-host> |
| 177 | + export TAP_POSTGRES_PORT=<postgres-port> |
| 178 | + export TAP_POSTGRES_USER=<postgres-user> |
| 179 | + export TAP_POSTGRES_PASSWORD=<postgres-password> |
| 180 | +``` |
| 181 | +
|
| 182 | +Test objects will be created in the `postgres` database. |
| 183 | +
|
| 184 | +3. To run the tests: |
| 185 | +``` |
| 186 | + make test |
| 187 | +``` |
| 188 | +
|
| 189 | +### To run pylint: |
| 190 | +
|
| 191 | +1. Install python dependencies and run python linter |
| 192 | +``` |
| 193 | + python3 -m venv venv |
| 194 | + . venv/bin/activate |
| 195 | + pip install --upgrade pip |
| 196 | + pip install .[test] |
| 197 | + make pylint |
| 198 | +``` |
| 199 | +
|
| 200 | +--- |
0 commit comments