Description
As the cln-grpc
interface stabilizes and increases its coverage over
the existing RPC methods, we see more and more applications going
grpc-only, or at least wanting to. One major missing feature is the
integration with hooks over grpc, and users are starting to ask for
these. We have refrained from exposing hooks over the grpc interface,
because hooks have very strong guarantees when implemented in plugins:
- Always present: since a plugin's lifetime is controlled by
lightningd
we can ensure that the plugin is up and running, and
most importantly responding to incoming calls, before invoking
anything that might require a hook response. This means that there
cannot be a hook call without the plugin ready to process and reply
to it. This is no longer the case where the subscriber to the hook
is running externally, and we need to find a way to account for
these new states (hook-call but subscriber is unresponsive, has not
subscribed yet or simply not there). - Shared fate: a plugin that subscribes to a hook, and then crashes
will take downlightningd
itself. This is the only save reaction,
since the crashing plugin may be performing absolutely critical
operations (firewall? interception for multi-tenant setups?
etc...). If we were to continue we would have to eithercontinue
orfail
the HTLC, neither of which is the correct response for
some of these scenario (fail
because we couldn't log details?
continue
even though the plugin is a firewall protecting
something important?). Byabort()
ing we ensure that we get
another chance of the plugin working correctly when restarted (see
replay below), or at least the human operator will see the issue
and can intervene. With remote subscribers viacln-grpc
we have
additional cases to tackle: hook calls before subscription,
disconnect before or during a hook call (with and without
reconnection), and figuring out how to tackle these may not be
obvious. - Replay on startup: as mentioned in the docs, the hook subscriber
needs to be idempotent for the case in which we restart while
processing HTLCs. The HTLC events get replayed during the startup
sequence, ensuring at-least-once semantics on the hook
subscription. This means that we get HTLCs very early on, possibly
before any subscriber had a chance to connect and subscribe.
Design
We propose an incremental approach to get hook subscriptions working
on cln-grpc
:
- Best-effort support:
cln-grpc
subscribes to all notifications and
hooks, and then we fan-out to all subscribers that are currently
present. The subscription can specify afailmode
which determines
whethercln-grpc
will cause anabort()
in case of a disconnect
while handling a hook or whether we should justcontinue
ignoring
the failed call. We also need to implement the result-merging logic
fromlightningd
incln-grpc
so that we can chain multiple hook
calls and still return a single result tolightningd
. - Metadata: we add a priority to indicate in which order the hooks
have to be called. Hook subscriptions also get a unique name so
that we can identify them in the next step. - Configure mandatory hook subscriptions: this is the heavy handed
step in which we block the hooks until all the configured
subscribers are present and weabort()
if any of them are not
present for a call. This includes implementing an early grpc listen
(before the JSON-RPC is ready) due to the replay of HTLCs, blocking
the first hook call until the subscribers specified in the config
are present, potentially delaying startup considerably (making this
clear to operators may be non-trivial).
This is the approximate plan I could come up with, but of course all
of these are open to discussion. In fact the reason I'm posting this
issue is exactly because we need the feedback from our prospective
users. Ideally we'd get semantic equality between the plugin
subscribers and the remote subscribers, however that may not be
possible (and it may be overly zealous).
NB: I may not have time to actually implement this in the near future,
but starting the discussion and addressing potential issues makes
implementing easier. And who knows, maybe someone wants to pick this
up on my behalf?