-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow to use application name for upgrade (takeover) #1082
base: master
Are you sure you want to change the base?
allow to use application name for upgrade (takeover) #1082
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
I'm testing if our troubles with istio are fixed now. |
not working atm. testing shows error. @alfsch
|
That's because |
@CyberDem0n i don't know why that should matter |
I tried updating it with istio vs without istio (both logs are from master postgres). Without istio everything works perfectly and very fast:
with istio it does not work and it seems to take forever for the leader to start while the workers don't get healthy at all. Like its waiting for a timeout or something while updating:
|
does someone have an idea what the problem could be? |
Can you have a look at Alexanders advice? Maybe post the different outputs of app name and the function call. |
i logged quite a lot: def ensure_replica_state(member):
ip = member.conn_kwargs().get('host')
lag = streaming.get((ip, member.name))
logger.error('Lag: %r', lag)
logger.error('ip: %r', ip)
logger.error('name: %r', member.name)
logger.error('USE_APPLICATION_NAME_IN_UPGRADE: %r', os.getenv('USE_APPLICATION_NAME_IN_UPGRADE'))
if lag is None and os.getenv('USE_APPLICATION_NAME_IN_UPGRADE'):
# Debug the streaming dictionary
logger.error('streaming dict: %r', streaming)
# Try looking up by any IP address matching the member name
matching_items = [(ip_app, lag) for (ip_app, lag) in streaming.items() if ip_app[1] == member.name]
logger.error('matching items: %r', matching_items)
lag = next((lag for (_, app_name), lag in streaming.items() if app_name == member.name), None)
if lag is None:
return logger.error('Member %s is not streaming from the primary', member.name)
if lag > 16*1024*1024:
return logger.error('Replication lag %s on member %s is too high', lag, member.name)
it actually looks like the name contains invalid characters. i'll try if using the named function from @CyberDem0n will make a difference 👍🏼 |
@seekermarcel scratch it, as usual I mixed it up with names of replication slots :( |
@CyberDem0n well thats unfortunate but happens 😄 |
I noticed that everything was working up to the point of rsync between the nodes. I added functionality to add the node config as well as possible proxy configs automatically to the rsync.conf For me the update is working now. Please verify this @cristi-vlad @alfsch |
@seekermarcel Nice, will try your changes tomorrow. 👀 |
awesome to see some progress here, thanks a lot! this will help us as well |
@seekermarcel Thank You for pushing this. This solves the problem with the updates for us as well. It would be great to have documentation update as well. |
Nice implementation, what I'm missing is some documentation for the new environment variable 'USE_APPLICATION_NAME_IN_UPGRADE' in ENVIRONMENT.rst. Could you provide it? |
@seekermarcel thx for updating the docs 👍 |
Thanks @seekermarcel for this fix. |
thanks @meenakshi-koushik very good find. thats absolutely correct. I adapted it to follow the same way USE_OLD_LOCALES works |
So do I need to do something else? What's missing for merge approval? |
@FxKu are you missing things in this PR? We need the changes for our production clusters using the service mesh. |
@FxKu thx 👍 |
@FxKu thank you ❤️ |
this is a takeover of #915 since we need it and the dev is unreachable.
should close zalando/postgres-operator#1629