Skip to content

design: Expose the htlc_accepted hook over cln-grpc #6508

Open
@cdecker

Description

@cdecker

As the cln-grpc interface stabilizes and increases its coverage over
the existing RPC methods, we see more and more applications going
grpc-only, or at least wanting to. One major missing feature is the
integration with hooks over grpc, and users are starting to ask for
these. We have refrained from exposing hooks over the grpc interface,
because hooks have very strong guarantees when implemented in plugins:

  • Always present: since a plugin's lifetime is controlled by
    lightningd we can ensure that the plugin is up and running, and
    most importantly responding to incoming calls, before invoking
    anything that might require a hook response. This means that there
    cannot be a hook call without the plugin ready to process and reply
    to it. This is no longer the case where the subscriber to the hook
    is running externally, and we need to find a way to account for
    these new states (hook-call but subscriber is unresponsive, has not
    subscribed yet or simply not there).
  • Shared fate: a plugin that subscribes to a hook, and then crashes
    will take down lightningd itself. This is the only save reaction,
    since the crashing plugin may be performing absolutely critical
    operations (firewall? interception for multi-tenant setups?
    etc...). If we were to continue we would have to either continue
    or fail the HTLC, neither of which is the correct response for
    some of these scenario (fail because we couldn't log details?
    continue even though the plugin is a firewall protecting
    something important?). By abort()ing we ensure that we get
    another chance of the plugin working correctly when restarted (see
    replay below), or at least the human operator will see the issue
    and can intervene. With remote subscribers via cln-grpc we have
    additional cases to tackle: hook calls before subscription,
    disconnect before or during a hook call (with and without
    reconnection), and figuring out how to tackle these may not be
    obvious.
  • Replay on startup: as mentioned in the docs, the hook subscriber
    needs to be idempotent for the case in which we restart while
    processing HTLCs. The HTLC events get replayed during the startup
    sequence, ensuring at-least-once semantics on the hook
    subscription. This means that we get HTLCs very early on, possibly
    before any subscriber had a chance to connect and subscribe.

Design

We propose an incremental approach to get hook subscriptions working
on cln-grpc:

  • Best-effort support: cln-grpc subscribes to all notifications and
    hooks, and then we fan-out to all subscribers that are currently
    present. The subscription can specify a failmode which determines
    whether cln-grpc will cause an abort() in case of a disconnect
    while handling a hook or whether we should just continue ignoring
    the failed call. We also need to implement the result-merging logic
    from lightningd in cln-grpc so that we can chain multiple hook
    calls and still return a single result to lightningd.
  • Metadata: we add a priority to indicate in which order the hooks
    have to be called. Hook subscriptions also get a unique name so
    that we can identify them in the next step.
  • Configure mandatory hook subscriptions: this is the heavy handed
    step in which we block the hooks until all the configured
    subscribers are present and we abort() if any of them are not
    present for a call. This includes implementing an early grpc listen
    (before the JSON-RPC is ready) due to the replay of HTLCs, blocking
    the first hook call until the subscribers specified in the config
    are present, potentially delaying startup considerably (making this
    clear to operators may be non-trivial).

This is the approximate plan I could come up with, but of course all
of these are open to discussion. In fact the reason I'm posting this
issue is exactly because we need the feedback from our prospective
users. Ideally we'd get semantic equality between the plugin
subscribers and the remote subscribers, however that may not be
possible (and it may be overly zealous).

NB: I may not have time to actually implement this in the near future,
but starting the discussion and addressing potential issues makes
implementing easier. And who knows, maybe someone wants to pick this
up on my behalf?

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionopen to suggestionThis issue is desired but there is a lack of a good design on how to fix this problem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions