Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to use application name for upgrade (takeover) #1082

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

seekermarcel
Copy link

this is a takeover of #915 since we need it and the dev is unreachable.

should close zalando/postgres-operator#1629

@alfsch
Copy link

alfsch commented Feb 12, 2025

@middagj @Jan-M, this pull is taken over from #915 and has your suggestions integrated. Plz. review.

Copy link

@middagj middagj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@alfsch
Copy link

alfsch commented Feb 13, 2025

I'm testing if our troubles with istio are fixed now.

@seekermarcel
Copy link
Author

not working atm. testing shows error. @alfsch

postgres@pg-db01-1:~$ cat last_upgrade.log 
2025-02-13 08:49:17,102 inplace_upgrade INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-02-13 08:49:17,134 inplace_upgrade INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 08:49:17,181 inplace_upgrade ERROR: Member pg-db01-0 is not streaming from the primary

@CyberDem0n
Copy link
Contributor

not working atm. testing shows error
Member pg-db01-0 is not streaming from the primary

That's because application_name can contain only lower-case alphanumeric characters and underscore.
You have to use slot_name_from_member_name() function to convert member name to application_name.

@seekermarcel
Copy link
Author

@CyberDem0n i don't know why that should matter

@seekermarcel
Copy link
Author

seekermarcel commented Feb 13, 2025

I tried updating it with istio vs without istio (both logs are from master postgres). Without istio everything works perfectly and very fast:

2025-02-13 10:34:41,445 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2025-02-13 10:34:43,450 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2025-02-13 10:34:43,450 - bootstrapping - INFO - No meta-data available for this provider
2025-02-13 10:34:43,451 - bootstrapping - INFO - Looks like you are running local
2025-02-13 10:34:43,467 - bootstrapping - INFO - Configuring standby-cluster
2025-02-13 10:34:43,467 - bootstrapping - INFO - Configuring log
2025-02-13 10:34:43,467 - bootstrapping - INFO - Configuring pam-oauth2
2025-02-13 10:34:43,467 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2025-02-13 10:34:43,468 - bootstrapping - INFO - Configuring certificate
2025-02-13 10:34:43,468 - bootstrapping - INFO - Generating ssl self-signed certificate
2025-02-13 10:34:43,542 - bootstrapping - INFO - Configuring crontab
2025-02-13 10:34:43,542 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2025-02-13 10:34:43,542 - bootstrapping - INFO - Configuring patroni
2025-02-13 10:34:43,548 - bootstrapping - INFO - Writing to file /run/postgres.yml
2025-02-13 10:34:43,548 - bootstrapping - INFO - Configuring wal-e
2025-02-13 10:34:43,548 - bootstrapping - INFO - Configuring pgbouncer
2025-02-13 10:34:43,548 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2025-02-13 10:34:43,548 - bootstrapping - INFO - Configuring bootstrap
2025-02-13 10:34:43,548 - bootstrapping - INFO - Configuring pgqd
2025-02-13 10:34:44,822 INFO: Selected new K8s API server endpoint https://10.172.1.2:6443
2025-02-13 10:34:44,842 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-02-13 10:34:44,847 WARNING: Postgresql is not running.
2025-02-13 10:34:44,847 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:34:44,848 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7470849738265096263
  Database cluster state: in archive recovery
  pg_control last modified: Thu Feb 13 10:34:28 2025
  Latest checkpoint location: 0/5000028
  Latest checkpoint's REDO location: 0/5000028
  Latest checkpoint's REDO WAL file: 000000010000000000000005
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:728
  Latest checkpoint's NextOID: 16722
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Thu Feb 13 10:34:25 2025
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/5000028
  Min recovery ending loc's timeline: 1
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 0
  Mock authentication nonce: 04da0f7ce35b10dbea183ca0f58685b46298add0da8ac3adfa12aa96c40567ab

2025-02-13 10:34:44,848 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:34:44,855 INFO: Local timeline=1 lsn=0/5000028
2025-02-13 10:34:44,866 INFO: primary_timeline=2
2025-02-13 10:34:44,867 INFO: primary: history=1	0/50000A0	no recovery target specified
2025-02-13 10:34:44,867 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:34:44,933 INFO: starting as a secondary
2025-02-13 10:34:45 UTC [65]: [1-1] 67adcac5.41 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2025-02-13 10:34:45 UTC [65]: [2-1] 67adcac5.41 0     LOG:  pg_stat_kcache.linux_hz is set to 1000000
2025-02-13 10:34:45,221 INFO: postmaster pid=65
/var/run/postgresql:5432 - no response
2025-02-13 10:34:45 UTC [65]: [3-1] 67adcac5.41 0     LOG:  redirecting log output to logging collector process
2025-02-13 10:34:45 UTC [65]: [4-1] 67adcac5.41 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - accepting connections
/var/run/postgresql:5432 - accepting connections
2025-02-13 10:34:46,253 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:34:46,253 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:34:46,282 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)
2025-02-13 10:34:47,093 INFO: establishing a new patroni restapi connection to postgres
2025-02-13 10:34:47,613 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)
2025-02-13 10:34:49,310 INFO: PAUSE: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)
2025-02-13 10:34:49,348 INFO: Changed tcp_keepalives_idle from '0' to '900'
2025-02-13 10:34:49,348 INFO: Changed tcp_keepalives_interval from '0' to '100'
2025-02-13 10:34:49,351 INFO: Reloading PostgreSQL configuration.
server signaled
2025-02-13 10:34:50,393 INFO: PAUSE: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)
2025-02-13 10:34:57,737 WARNING: system ID has changed while in paused mode. Patroni will exit when resuming unless system ID is reset: 7470850315831099650 != 7470849738265096263
2025-02-13 10:34:57,737 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:34:57,737 WARNING: Postgresql is not running.
2025-02-13 10:34:57,737 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:34:57,777 INFO: PAUSE: postgres is not running
/etc/runit/runsvdir/default/patroni: finished with code=-1 signal=9
2025-02-13 10:34:59,935 INFO: Selected new K8s API server endpoint https://10.172.1.2:6443
2025-02-13 10:34:59,960 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-02-13 10:34:59,965 WARNING: Postgresql is not running.
2025-02-13 10:34:59,965 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:35:00,027 INFO: PAUSE: postgres is not running
2025-02-13 10:35:00,436 WARNING: Postgresql is not running.
2025-02-13 10:35:00,436 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:35:00,438 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202107181
  Database system identifier: 7470850315831099650
  Database cluster state: shut down
  pg_control last modified: Thu Feb 13 10:34:52 2025
  Latest checkpoint location: 0/8004120
  Latest checkpoint's REDO location: 0/8004120
  Latest checkpoint's REDO WAL file: 000000010000000000000008
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: off
  Latest checkpoint's NextXID: 0:1522
  Latest checkpoint's NextOID: 16785
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 0
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 0
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Thu Feb 13 10:34:52 2025
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 0
  Mock authentication nonce: 87d9f34633665f13d8a216157f6439ac735f3fb284f965f97873769efb11a6f1

2025-02-13 10:35:00,438 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:35:00,450 INFO: Local timeline=1 lsn=0/8004120
2025-02-13 10:35:00,491 INFO: primary_timeline=1
2025-02-13 10:35:00,491 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:35:00,547 INFO: starting as a secondary
2025-02-13 10:35:00,549 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-02-13 10:35:00,552 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:35:00,553 INFO: restarting after failure in progress
2025-02-13 10:35:01 UTC [131]: [1-1] 67adcad5.83 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2025-02-13 10:35:01 UTC [131]: [2-1] 67adcad5.83 0     LOG:  pg_stat_kcache.linux_hz is set to 1000000
2025-02-13 10:35:01,092 INFO: postmaster pid=131
/var/run/postgresql:5432 - no response
2025-02-13 10:35:01 UTC [131]: [3-1] 67adcad5.83 0     LOG:  redirecting log output to logging collector process
2025-02-13 10:35:01 UTC [131]: [4-1] 67adcad5.83 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - accepting connections
/var/run/postgresql:5432 - accepting connections
2025-02-13 10:35:02,128 INFO: Lock owner: pg-db01-1; I am pg-db01-0
2025-02-13 10:35:02,128 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:35:02,177 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)
2025-02-13 10:35:04,541 INFO: establishing a new patroni restapi connection to postgres
2025-02-13 10:35:10,474 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)
2025-02-13 10:35:20,464 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)
2025-02-13 10:35:30,469 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)
2025-02-13 10:35:40,468 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-1)

with istio it does not work and it seems to take forever for the leader to start while the workers don't get healthy at all. Like its waiting for a timeout or something while updating:

2025-02-13 10:15:47,137 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2025-02-13 10:15:49,142 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2025-02-13 10:15:49,144 - bootstrapping - INFO - No meta-data available for this provider
2025-02-13 10:15:49,145 - bootstrapping - INFO - Looks like you are running local
2025-02-13 10:15:49,168 - bootstrapping - INFO - Configuring certificate
2025-02-13 10:15:49,168 - bootstrapping - INFO - Generating ssl self-signed certificate
2025-02-13 10:15:49,271 - bootstrapping - INFO - Configuring standby-cluster
2025-02-13 10:15:49,271 - bootstrapping - INFO - Configuring bootstrap
2025-02-13 10:15:49,271 - bootstrapping - INFO - Configuring pgbouncer
2025-02-13 10:15:49,271 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2025-02-13 10:15:49,271 - bootstrapping - INFO - Configuring log
2025-02-13 10:15:49,271 - bootstrapping - INFO - Configuring patroni
2025-02-13 10:15:49,278 - bootstrapping - INFO - Writing to file /run/postgres.yml
2025-02-13 10:15:49,279 - bootstrapping - INFO - Configuring crontab
2025-02-13 10:15:49,279 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2025-02-13 10:15:49,279 - bootstrapping - INFO - Configuring wal-e
2025-02-13 10:15:49,279 - bootstrapping - INFO - Configuring pgqd
2025-02-13 10:15:49,279 - bootstrapping - INFO - Configuring pam-oauth2
2025-02-13 10:15:49,279 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2025-02-13 10:15:50,615 INFO: Selected new K8s API server endpoint https://10.172.1.2:6443
2025-02-13 10:15:50,640 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-02-13 10:15:50,645 WARNING: Postgresql is not running.
2025-02-13 10:15:50,645 INFO: Lock owner: pg-db01-2; I am pg-db01-0
2025-02-13 10:15:50,646 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7470845104643387462
  Database cluster state: shut down in recovery
  pg_control last modified: Thu Feb 13 10:15:32 2025
  Latest checkpoint location: 0/40083D0
  Latest checkpoint's REDO location: 0/4008398
  Latest checkpoint's REDO WAL file: 000000010000000000000004
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:726
  Latest checkpoint's NextOID: 24576
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 479
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 725
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Thu Feb 13 10:15:30 2025
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/400C368
  Min recovery ending loc's timeline: 1
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 0
  Mock authentication nonce: dcc147981d93efd3a253d71c6e8dd19107916af7d6c1feba06a17e0ce45f45f1

2025-02-13 10:15:50,647 INFO: Lock owner: pg-db01-2; I am pg-db01-0
2025-02-13 10:15:50,655 INFO: Local timeline=1 lsn=0/400C368
2025-02-13 10:15:50,675 INFO: primary_timeline=1
2025-02-13 10:15:50,675 INFO: Lock owner: pg-db01-2; I am pg-db01-0
2025-02-13 10:15:50,746 INFO: starting as a secondary
2025-02-13 10:15:51 UTC [65]: [1-1] 67adc657.41 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2025-02-13 10:15:51 UTC [65]: [2-1] 67adc657.41 0     LOG:  pg_stat_kcache.linux_hz is set to 1000000
2025-02-13 10:15:51,113 INFO: postmaster pid=65
/var/run/postgresql:5432 - no response
2025-02-13 10:15:51 UTC [65]: [3-1] 67adc657.41 0     LOG:  redirecting log output to logging collector process
2025-02-13 10:15:51 UTC [65]: [4-1] 67adc657.41 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - accepting connections
/var/run/postgresql:5432 - accepting connections
2025-02-13 10:15:52,146 INFO: Lock owner: pg-db01-2; I am pg-db01-0
2025-02-13 10:15:52,146 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:15:52,183 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-2)
2025-02-13 10:15:56,073 INFO: establishing a new patroni restapi connection to postgres
2025-02-13 10:15:57,414 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-2)
2025-02-13 10:16:07,455 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-2)
2025-02-13 10:16:17,445 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-2)
2025-02-13 10:16:17,588 INFO: no action. I am (pg-db01-0), a secondary, and following a leader (pg-db01-2)
2025-02-13 10:16:18,662 INFO: Cleaning up failover key after acquiring leader lock...
2025-02-13 10:16:18,668 WARNING: Could not activate Linux watchdog device: Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'
server promoting
2025-02-13 10:16:18,702 INFO: promoted self to leader by acquiring session lock
DO
NOTICE:  role "admin" is already a member of role "cron_admin"
GRANT ROLE
DO
DO
NOTICE:  extension "pg_auth_mon" already exists, skipping
CREATE EXTENSION
NOTICE:  version "1.1" of extension "pg_auth_mon" is already installed
ALTER EXTENSION
GRANT
NOTICE:  extension "pg_cron" already exists, skipping
CREATE EXTENSION
DO
NOTICE:  version "1.6" of extension "pg_cron" is already installed
ALTER EXTENSION
ALTER POLICY
REVOKE
GRANT
REVOKE
GRANT
ALTER POLICY
REVOKE
GRANT
CREATE FUNCTION
REVOKE
GRANT
REVOKE
GRANT
NOTICE:  extension "file_fdw" already exists, skipping
CREATE EXTENSION
DO
NOTICE:  relation "postgres_log" already exists, skipping
CREATE TABLE
GRANT
NOTICE:  column "backend_type" of relation "postgres_log" already exists, skipping
ALTER TABLE
DO
NOTICE:  relation "postgres_log_0" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_1" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_2" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_3" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_4" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_5" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_6" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
NOTICE:  relation "postgres_log_7" already exists, skipping
CREATE FOREIGN TABLE
GRANT
CREATE VIEW
ALTER VIEW
GRANT
RESET
SET
NOTICE:  drop cascades to 7 other objects
DETAIL:  drop cascades to type zmon_utils.system_information
drop cascades to function zmon_utils.get_database_cluster_information()
drop cascades to function zmon_utils.get_database_cluster_system_information()
drop cascades to function zmon_utils.get_last_status_active_cronjobs()
drop cascades to view zmon_utils.last_status_active_cronjobs
drop cascades to function zmon_utils.get_replay_lag()
drop cascades to view zmon_utils.replay_lag
DROP SCHEMA
NOTICE:  extension "plpython3u" already exists, skipping
DO
NOTICE:  language "plpythonu" does not exist, skipping
DROP LANGUAGE
NOTICE:  function plpython_call_handler() does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_inline_handler(internal) does not exist, skipping
DROP FUNCTION
NOTICE:  function plpython_validator(oid) does not exist, skipping
DROP FUNCTION
CREATE SCHEMA
GRANT
SET
CREATE TYPE
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
REVOKE
CREATE VIEW
REVOKE
GRANT
CREATE FUNCTION
CREATE VIEW
REVOKE
REVOKE
GRANT
GRANT
You are now connected to database "postgres" as user "postgres".
NOTICE:  schema "user_management" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
NOTICE:  extension "pg_stat_statements" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "pg_stat_kcache" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "set_user" already exists, skipping
CREATE EXTENSION
NOTICE:  version "4.1.0" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
GRANT
NOTICE:  schema "metric_helpers" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
REVOKE
GRANT
REVOKE
GRANT
RESET
You are now connected to database "ordermanagement" as user "postgres".
NOTICE:  schema "user_management" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
NOTICE:  extension "pg_stat_statements" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "pg_stat_kcache" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "set_user" already exists, skipping
CREATE EXTENSION
NOTICE:  version "4.1.0" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
GRANT
NOTICE:  schema "metric_helpers" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
REVOKE
GRANT
REVOKE
GRANT
RESET
You are now connected to database "template1" as user "postgres".
NOTICE:  schema "user_management" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
CREATE FUNCTION
REVOKE
GRANT
COMMENT
GRANT
RESET
NOTICE:  extension "pg_stat_statements" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "pg_stat_kcache" already exists, skipping
CREATE EXTENSION
NOTICE:  extension "set_user" already exists, skipping
CREATE EXTENSION
NOTICE:  version "4.1.0" of extension "set_user" is already installed
ALTER EXTENSION
GRANT
GRANT
GRANT
NOTICE:  schema "metric_helpers" already exists, skipping
CREATE SCHEMA
GRANT
SET
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
REVOKE
GRANT
REVOKE
GRANT
RESET
2025-02-13 10:16:19,791 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:16:19,862 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:16:29,829 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:16:39,836 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:16:45,706 INFO: PAUSE: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:16:45,715 INFO: Changed tcp_keepalives_idle from '0' to '900'
2025-02-13 10:16:45,715 INFO: Changed tcp_keepalives_interval from '0' to '100'
2025-02-13 10:16:45,718 INFO: Reloading PostgreSQL configuration.
server signaled
2025-02-13 10:16:47,534 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:16:48,475 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:16:49,676 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:16:50,145 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:16:50,145 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1443, in query
    heartbeat_connection.get()  # try to open psycopg connection to postgres
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/connection.py", line 55, in get
    self._connection = psycopg.connect(**self._conn_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/patroni/psycopg.py", line 123, in connect
    ret = _connect(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
	Is the server running locally and accepting connections on that socket?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1320, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name,
  File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1252, in query
    return self.server.query(sql, *params)
  File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1445, in query
    raise PostgresConnectionException('connection problems') from exc
patroni.exceptions.PostgresConnectionException: connection problems
2025-02-13 10:16:50,320 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:16:50,320 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1443, in query
    heartbeat_connection.get()  # try to open psycopg connection to postgres
  File "/usr/local/lib/python3.10/dist-packages/patroni/postgresql/connection.py", line 55, in get
    self._connection = psycopg.connect(**self._conn_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/patroni/psycopg.py", line 123, in connect
    ret = _connect(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
	Is the server running locally and accepting connections on that socket?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1320, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name,
  File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1252, in query
    return self.server.query(sql, *params)
  File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1445, in query
    raise PostgresConnectionException('connection problems') from exc
patroni.exceptions.PostgresConnectionException: connection problems
2025-02-13 10:16:50.326 UTC [37] LOG Starting pgqd 3.5
2025-02-13 10:16:50.326 UTC [37] LOG auto-detecting dbs ...
2025-02-13 10:16:50.327 UTC [37] ERROR connection error: PQconnectStart
2025-02-13 10:16:50.327 UTC [37] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
		Is the server running locally and accepting connections on that socket?
2025-02-13 10:16:50,557 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:16:51,668 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:16:51,668 WARNING: Retry got exception: connection problems
2025-02-13 10:16:51,668 WARNING: Exception PostgresConnectionException('Exceeded retry deadline') when running query
2025-02-13 10:16:55,660 WARNING: Postgresql is not running.
2025-02-13 10:16:55,660 INFO: Lock owner: pg-db01-0; I am pg-db01-0
2025-02-13 10:16:55,695 INFO: PAUSE: removed leader lock because postgres is not running
2025-02-13 10:16:55,704 WARNING: Postgresql is not running.
2025-02-13 10:16:55,704 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:16:55,705 INFO: PAUSE: postgres is not running
2025-02-13 10:17:05,699 WARNING: Postgresql is not running.
2025-02-13 10:17:05,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:17:05,699 INFO: PAUSE: postgres is not running
2025-02-13 10:17:15,699 WARNING: Postgresql is not running.
2025-02-13 10:17:15,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:17:15,699 INFO: PAUSE: postgres is not running
2025-02-13 10:17:20.355 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:17:25,703 WARNING: Postgresql is not running.
2025-02-13 10:17:25,703 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:17:25,703 INFO: PAUSE: postgres is not running
2025-02-13 10:17:35,699 WARNING: Postgresql is not running.
2025-02-13 10:17:35,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:17:35,699 INFO: PAUSE: postgres is not running
2025-02-13 10:17:45,699 WARNING: Postgresql is not running.
2025-02-13 10:17:45,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:17:45,699 INFO: PAUSE: postgres is not running
2025-02-13 10:17:50.358 UTC [37] ERROR connection error: PQconnectStart
2025-02-13 10:17:50.358 UTC [37] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
		Is the server running locally and accepting connections on that socket?
2025-02-13 10:17:50.358 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:17:55,699 WARNING: Postgresql is not running.
2025-02-13 10:17:55,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:17:55,699 INFO: PAUSE: postgres is not running
2025-02-13 10:18:05,699 WARNING: Postgresql is not running.
2025-02-13 10:18:05,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:18:05,699 INFO: PAUSE: postgres is not running
2025-02-13 10:18:15,700 WARNING: Postgresql is not running.
2025-02-13 10:18:15,701 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:18:15,701 INFO: PAUSE: postgres is not running
2025-02-13 10:18:20.387 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:18:25,699 WARNING: Postgresql is not running.
2025-02-13 10:18:25,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:18:25,699 INFO: PAUSE: postgres is not running
2025-02-13 10:18:35,699 WARNING: Postgresql is not running.
2025-02-13 10:18:35,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:18:35,699 INFO: PAUSE: postgres is not running
2025-02-13 10:18:45,699 WARNING: Postgresql is not running.
2025-02-13 10:18:45,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:18:45,699 INFO: PAUSE: postgres is not running
2025-02-13 10:18:50.389 UTC [37] ERROR connection error: PQconnectStart
2025-02-13 10:18:50.389 UTC [37] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
		Is the server running locally and accepting connections on that socket?
2025-02-13 10:18:50.389 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:18:55,699 WARNING: Postgresql is not running.
2025-02-13 10:18:55,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:18:55,699 INFO: PAUSE: postgres is not running
2025-02-13 10:19:05,698 WARNING: Postgresql is not running.
2025-02-13 10:19:05,698 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:19:05,699 INFO: PAUSE: postgres is not running
2025-02-13 10:19:15,698 WARNING: Postgresql is not running.
2025-02-13 10:19:15,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:19:15,699 INFO: PAUSE: postgres is not running
2025-02-13 10:19:20.418 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:19:25,700 WARNING: Postgresql is not running.
2025-02-13 10:19:25,700 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:19:25,700 INFO: PAUSE: postgres is not running
2025-02-13 10:19:35,698 WARNING: Postgresql is not running.
2025-02-13 10:19:35,698 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:19:35,698 INFO: PAUSE: postgres is not running
2025-02-13 10:19:45,699 WARNING: Postgresql is not running.
2025-02-13 10:19:45,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:19:45,699 INFO: PAUSE: postgres is not running
2025-02-13 10:19:50.420 UTC [37] ERROR connection error: PQconnectStart
2025-02-13 10:19:50.420 UTC [37] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
		Is the server running locally and accepting connections on that socket?
2025-02-13 10:19:50.420 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:19:55,699 WARNING: Postgresql is not running.
2025-02-13 10:19:55,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:19:55,699 INFO: PAUSE: postgres is not running
2025-02-13 10:20:05,698 WARNING: Postgresql is not running.
2025-02-13 10:20:05,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:20:05,699 INFO: PAUSE: postgres is not running
2025-02-13 10:20:15,699 WARNING: Postgresql is not running.
2025-02-13 10:20:15,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:20:15,699 INFO: PAUSE: postgres is not running
2025-02-13 10:20:20.449 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:20:25,698 WARNING: Postgresql is not running.
2025-02-13 10:20:25,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:20:25,699 INFO: PAUSE: postgres is not running
2025-02-13 10:20:35,699 WARNING: Postgresql is not running.
2025-02-13 10:20:35,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:20:35,700 INFO: PAUSE: postgres is not running
2025-02-13 10:20:45,699 WARNING: Postgresql is not running.
2025-02-13 10:20:45,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:20:45,699 INFO: PAUSE: postgres is not running
2025-02-13 10:20:50.451 UTC [37] ERROR connection error: PQconnectStart
2025-02-13 10:20:50.451 UTC [37] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
		Is the server running locally and accepting connections on that socket?
2025-02-13 10:20:50.451 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:20:55,699 WARNING: Postgresql is not running.
2025-02-13 10:20:55,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:20:55,699 INFO: PAUSE: postgres is not running
2025-02-13 10:21:05,699 WARNING: Postgresql is not running.
2025-02-13 10:21:05,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:21:05,699 INFO: PAUSE: postgres is not running
2025-02-13 10:21:15,699 WARNING: Postgresql is not running.
2025-02-13 10:21:15,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:21:15,699 INFO: PAUSE: postgres is not running
2025-02-13 10:21:20.480 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:21:25,700 WARNING: Postgresql is not running.
2025-02-13 10:21:25,700 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:21:25,700 INFO: PAUSE: postgres is not running
2025-02-13 10:21:35,699 WARNING: Postgresql is not running.
2025-02-13 10:21:35,699 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:21:35,699 INFO: PAUSE: postgres is not running
2025-02-13 10:21:45,698 WARNING: Postgresql is not running.
2025-02-13 10:21:45,698 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:21:45,698 INFO: PAUSE: postgres is not running
2025-02-13 10:21:50.481 UTC [37] ERROR connection error: PQconnectStart
2025-02-13 10:21:50.481 UTC [37] ERROR libpq: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
		Is the server running locally and accepting connections on that socket?
2025-02-13 10:21:50.481 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
/etc/runit/runsvdir/default/patroni: finished with code=-1 signal=9
2025-02-13 10:21:53,483 INFO: Selected new K8s API server endpoint https://10.172.1.2:6443
2025-02-13 10:21:53,511 INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-02-13 10:21:53,516 WARNING: Postgresql is not running.
2025-02-13 10:21:53,516 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:21:53,587 INFO: PAUSE: postgres is not running
2025-02-13 10:21:54 UTC [712]: [1-1] 67adc7c2.2c8 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2025-02-13 10:21:54 UTC [712]: [2-1] 67adc7c2.2c8 0     LOG:  pg_stat_kcache.linux_hz is set to 1000000
2025-02-13 10:21:54,188 INFO: Lock owner: ; I am pg-db01-0
2025-02-13 10:21:54,571 INFO: postmaster pid=712
2025-02-13 10:21:54 UTC [712]: [3-1] 67adc7c2.2c8 0     LOG:  redirecting log output to logging collector process
2025-02-13 10:21:54 UTC [712]: [4-1] 67adc7c2.2c8 0     HINT:  Future log output will appear in directory "../pg_log".
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - accepting connections
2025-02-13 10:21:54,598 INFO: establishing a new patroni heartbeat connection to postgres
2025-02-13 10:21:54,642 INFO: PAUSE: acquired session lock as a leader
2025-02-13 10:21:56,426 INFO: establishing a new patroni restapi connection to postgres
2025-02-13 10:22:04,642 INFO: PAUSE: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:22:05,201 INFO: Lock owner: pg-db01-0; I am pg-db01-0
2025-02-13 10:22:05,209 WARNING: Could not activate Linux watchdog device: Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'
2025-02-13 10:22:05,254 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:22:05,262 INFO: Changed tcp_keepalives_idle from '0' to '900'
2025-02-13 10:22:05,262 INFO: Changed tcp_keepalives_interval from '0' to '100'
2025-02-13 10:22:05,264 INFO: Reloading PostgreSQL configuration.
server signaled
2025-02-13 10:22:15,225 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:22:20.511 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:22:25,227 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:22:35,207 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:22:45,225 INFO: no action. I am (pg-db01-0), the leader with the lock
2025-02-13 10:22:50.511 UTC [37] LOG {ticks: 0, maint: 0, retry: 0}
2025-02-13 10:22:55,226 INFO: no action. I am (pg-db01-0), the leader with the lock

@seekermarcel
Copy link
Author

does someone have an idea what the problem could be?

@Jan-M
Copy link
Member

Jan-M commented Feb 13, 2025

Can you have a look at Alexanders advice? Maybe post the different outputs of app name and the function call.

@seekermarcel
Copy link
Author

i logged quite a lot:

def ensure_replica_state(member):
            ip = member.conn_kwargs().get('host')
            lag = streaming.get((ip, member.name))
            logger.error('Lag: %r', lag)
            logger.error('ip: %r', ip)
            logger.error('name: %r', member.name)
            logger.error('USE_APPLICATION_NAME_IN_UPGRADE: %r', os.getenv('USE_APPLICATION_NAME_IN_UPGRADE'))
            if lag is None and os.getenv('USE_APPLICATION_NAME_IN_UPGRADE'):
                # Debug the streaming dictionary
                logger.error('streaming dict: %r', streaming)
                # Try looking up by any IP address matching the member name
                matching_items = [(ip_app, lag) for (ip_app, lag) in streaming.items() if ip_app[1] == member.name]
                logger.error('matching items: %r', matching_items)
                lag = next((lag for (_, app_name), lag in streaming.items() if app_name == member.name), None)
            if lag is None:
                return logger.error('Member %s is not streaming from the primary', member.name)
            if lag > 16*1024*1024:
                return logger.error('Replication lag %s on member %s is too high', lag, member.name)
postgres@pg-db01-1:~$ cat last_upgrade.log 
2025-02-14 14:55:12,912 inplace_upgrade INFO: No PostgreSQL configuration items changed, nothing to reload.
2025-02-14 14:55:12,931 inplace_upgrade INFO: establishing a new patroni heartbeat connection to postgres
2025-02-14 14:55:12,957 inplace_upgrade ERROR: Lag: None
2025-02-14 14:55:12,957 inplace_upgrade ERROR: ip: '10.244.0.100'
2025-02-14 14:55:12,957 inplace_upgrade ERROR: name: 'pg-db01-0'
2025-02-14 14:55:12,957 inplace_upgrade ERROR: USE_APPLICATION_NAME_IN_UPGRADE: 'true'
2025-02-14 14:55:12,957 inplace_upgrade ERROR: streaming dict: {('127.0.0.6', 'pg-db01-2'): 0, ('127.0.0.6', 'pg-db01-0'): 0}
2025-02-14 14:55:12,957 inplace_upgrade ERROR: matching items: [(('127.0.0.6', 'pg-db01-0'), 0)]
2025-02-14 14:55:12,974 inplace_upgrade ERROR: Lag: None
2025-02-14 14:55:12,974 inplace_upgrade ERROR: ip: '10.244.0.216'
2025-02-14 14:55:12,974 inplace_upgrade ERROR: name: 'pg-db01-2'
2025-02-14 14:55:12,974 inplace_upgrade ERROR: USE_APPLICATION_NAME_IN_UPGRADE: 'true'
2025-02-14 14:55:12,974 inplace_upgrade ERROR: streaming dict: {('127.0.0.6', 'pg-db01-2'): 0, ('127.0.0.6', 'pg-db01-0'): 0}
2025-02-14 14:55:12,974 inplace_upgrade ERROR: matching items: [(('127.0.0.6', 'pg-db01-2'), 0)]

it actually looks like the name contains invalid characters. i'll try if using the named function from @CyberDem0n will make a difference 👍🏼

@CyberDem0n
Copy link
Contributor

@seekermarcel scratch it, as usual I mixed it up with names of replication slots :(
There is no such limitation for application_name.

@seekermarcel
Copy link
Author

@CyberDem0n well thats unfortunate but happens 😄

@seekermarcel
Copy link
Author

I noticed that everything was working up to the point of rsync between the nodes.
The problem was a missing allowed connection for istio in the rsync.conf

I added functionality to add the node config as well as possible proxy configs automatically to the rsync.conf

For me the update is working now. Please verify this @cristi-vlad @alfsch

@alfsch
Copy link

alfsch commented Feb 17, 2025

@seekermarcel Nice, will try your changes tomorrow. 👀

@seekermarcel seekermarcel requested a review from middagj February 18, 2025 09:08
@alfsch
Copy link

alfsch commented Feb 18, 2025

@middagj @Jan-M Tested the implementation with service mesh (istio) and without service mesh. Works as expected, upgrade of postgres works now together with service mesh.

I also tested this implementation on clusters without service mesh and the upgrades work as expected 👍

@theBNT
Copy link

theBNT commented Feb 19, 2025

awesome to see some progress here, thanks a lot! this will help us as well

@CJsPod
Copy link

CJsPod commented Feb 19, 2025

@seekermarcel Thank You for pushing this. This solves the problem with the updates for us as well.

It would be great to have documentation update as well.

@cp319391
Copy link

Nice implementation, what I'm missing is some documentation for the new environment variable 'USE_APPLICATION_NAME_IN_UPGRADE' in ENVIRONMENT.rst. Could you provide it?

@seekermarcel
Copy link
Author

added to the ENVIRONMENT.rst @cp319391 @CJsPod

@alfsch
Copy link

alfsch commented Feb 20, 2025

@seekermarcel thx for updating the docs 👍

@meenakshi-koushik
Copy link

Thanks @seekermarcel for this fix.
The current implementation will use the application name even when USE_APPLICATION_NAME_IN_UPGRADE is set to false (the default value as per the documentation). It may be useful to follow the same semantic as other USE_* variables, e.g., USE_OLD_LOCALES where only a "true" value triggers the behavior. I am probably picking on a nit here. Nevertheless this is easy to introduce now than later. Thanks again.

@seekermarcel
Copy link
Author

thanks @meenakshi-koushik very good find. thats absolutely correct. I adapted it to follow the same way USE_OLD_LOCALES works

@seekermarcel
Copy link
Author

So do I need to do something else? What's missing for merge approval?

@alfsch
Copy link

alfsch commented Mar 20, 2025

@FxKu are you missing things in this PR? We need the changes for our production clusters using the service mesh.

@FxKu FxKu added the minor label Mar 27, 2025
@alfsch
Copy link

alfsch commented Apr 9, 2025

@FxKu thx 👍

@seekermarcel
Copy link
Author

@FxKu thank you ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client_addr on pg_catalog.pg_stat_replication wrong ip address - istio enabled
10 participants