Skip to content

Askrene: add Goldberg-Tarjan's MCF solver #8314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Lagrang3
Copy link
Collaborator

@Lagrang3 Lagrang3 commented Jun 4, 2025

With this PR I want to make an algorithmic improvement to the MCF solve in askrene.

First: I would like to constrain the number of flow units to 1M by setting the accuracy of the solver
to the total payment amount divided by 1M. Some MCF algorithms like "successive shortest path" (SSP)
have theoretical complexity bounds that depend on that number.

Second: I would like to prune the set of arcs in the network. I can achieve this by setting a limit to the sum
of the arc capacities that correspond to the same channel to 1M, which is the maximum number of flow units
that we send from the source to the destination. Notice that due to the piece-wise linearization of the channel
cost function, one channel becomes several arcs in the MCF network, therefore we can discard the higher cost
arcs of a channel linearization if the lower cost arcs already sum up to 1M in flow capacity.

Third: I would like to add an experimental option to switch the algorithm of the MCF solver to the well
known and highly efficient "Cost Scaling" solution by Goldberg-Tarjan 1990 [1] with heuristics [2].

To use it you would add the parameter dev_algorithm=goldberg-tarjan, for example:

lightning-cli -k getroutes source=$source destination=$destination amount_msat=1000sat final_cltv=6 layers="[]" maxfee_msat=1sat dev_algorithm="goldberg-tarjan"

Depends on #8299

[1] Finding Minimum-Cost Circulation by Successive Approximation. Goldberg and Tarjan. Mathematics of Operations Research, Vol. 15, No. 3 (1990), pp. 430--466.
[2] An efficient Implementation of a Scaling Minimum-Cost Flow Algorithm. Goldberg. Journal of Algorithms 22, 1--29 (1997).

Lagrang3 added 10 commits May 19, 2025 06:15
Add log information about the runtime of getroutes.

Changelog-None: askrene: add runtime of getroutes to the logs

Signed-off-by: Lagrang3 <[email protected]>
Move the feature tuning algorithm to mcf.c, ie. the loops for searching
a good mu and delay_feefactor to satisfy the problem constraints.

We are looking to set the stage for an execution logic that allows for
multiple choices of routing algorithms, mainly for experimenting without
breaking the default working code.

Changelog-None

Signed-off-by: Lagrang3 <[email protected]>
Prefer the following programming pattern:

do_getroutes(){
        do_something1();
        do_something2();
        do_something3();
}

rather than

get_routes(){
        do_something1();
        do_something2();
}

do_getroutes(){
        get_routes();
        do_something3();
}

Changelog-None

Signed-off-by: Lagrang3 <[email protected]>
Refactor MCF solver: remove structs linear_network and residual_network.
Prefer passing raw data to the helper functions.

Changelog-None

Signed-off-by: Lagrang3 <[email protected]>
The single path solver uses the same probability cost and fee cost
estimation of minflow. Single path routes computed this way are
suboptimal with respect to the MCF solution but still are optimal among
any other single path. Computationally is way faster than MCF, therefore
for some trivial payments it should be prefered.

Changelog-None.

Signed-off-by: Lagrang3 <[email protected]>
Changelog-Added: askrene: an optimal single-path solver has been added, it can be called using the developer option --dev_algorithm=single-path or by adding the layer "auto.no_mpp_support"

Signed-off-by: Lagrang3 <[email protected]>
Changelog-None.

Signed-off-by: Lagrang3 <[email protected]>
@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jun 4, 2025

Pruning the network for a payment of 1M sats on the compressed gossmap
"./tests/data/gossip-store-2024-09-22.compressed" yields 228018 arcs instead of 853328 arcs
without pruning.

To compute a payment route for 1M sats from node 3301 to the first 1000 nodes, it takes an average
of 586ms without pruning and 495ms with pruning. Here's a plot of the computation time distribution:

perf

The average doesn't seem to match the plot shown, that's because there is a long tail of outliers
with runtimes that exceed 1 sec and in some cases it goes up to 30 sec.

Constraint the number of flow units to 1M and prune arcs that are
provably not used in the MCF computation.

Changelog-None.

Signed-off-by: Lagrang3 <[email protected]>
@Lagrang3 Lagrang3 force-pushed the askrene-goldberg-tarjan branch 4 times, most recently from 56f81f4 to 35e1428 Compare June 9, 2025 05:39
Changelog-None

Signed-off-by: Lagrang3 <[email protected]>
@Lagrang3 Lagrang3 force-pushed the askrene-goldberg-tarjan branch 2 times, most recently from 729c253 to 4c0f508 Compare June 9, 2025 07:24
Changelog-EXPERIMENTAL: askrene: add developer-only switch to compute optimal MCF routing using Goldberg-Tarjan's Cost Scaling algorithm.

Signed-off-by: Lagrang3 <[email protected]>
@Lagrang3 Lagrang3 force-pushed the askrene-goldberg-tarjan branch from 4c0f508 to 43e2639 Compare June 9, 2025 08:09
@Lagrang3 Lagrang3 changed the title [WIP] Askrene goldberg tarjan Askrene: add Goldberg-Tarjan's MCF solver Jun 9, 2025
@Lagrang3 Lagrang3 marked this pull request as ready for review June 9, 2025 08:35
@Lagrang3 Lagrang3 requested a review from cdecker as a code owner June 9, 2025 08:35
@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jun 9, 2025

The GT solver does not perform faster than the current SSP implementation.
See below for example the time measurements for 1000 success payments:
perf-success

There are at least 4 ways in which we could improve its runtime:

  1. by putting a hard bound in the cost function per arc,
  2. stop earlier in the main epsilon loop, ie. do not achieve optimal but close to optimal solutions,
  3. implement a price refinement heuristics [2], ie. at each iteration of the main loop determine the smallest epsilon for which the current state is epsilon-optimal,
  4. use a dynamic tree or the first active order instead of a queue to process nodes.

@cdecker
Copy link
Member

cdecker commented Jun 9, 2025

Very interesting change @Lagrang3 , I'm not sure I grasp all of the consequences though. Specifically approximating and pruning at the 1M size, does that have an impact on payments smaller than 1Msats? Or am I just not understanding the impact of this optimization?

Also, should we mark this as a draft until we have experimented and proven the alternative approaches to be more performant?

@Lagrang3 Lagrang3 marked this pull request as draft June 9, 2025 14:01
@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jun 9, 2025

Hi @cdecker. Yes, let's make this a draft for the moment.

The 1M magic number is the number of units the payment will be split at most.
For example: a payment of 10M sats will be computed as if we wanted to send 1M units of flow where each unit
is made up of 10sats, that doesn't mean there will be 1M routes, the final solution could be one route of 1M units (10Msats) or two routes (one with 999999 units and another with 1, corresponding to 9999990sats and 10sats respectively), etc.
For payment amounts that cannot be divided by 1M we stop at 1msat per unit. So a payment of 1sat is considered
a problem in which we want to send 1000 units, each one corresponding to 1msat.

Some Min. Cost Flow algorithms do not care about how many flow units we use, but the default
solver in askrene uses Successive Shortest Paths which is pseudo polynomial (in terms of the size of the graph)
and in the worst case it performs as O(U SP(N,M)) where U is the number of flow units and SP(N,M) is the
runtime for solving a single shortest path on the graph. By limiting number of flow units U, we also limit
the accuracy of the partition of the payment but we also can improve the runtime.

Why does it matter?
Because testing different payment queries on askrene using this gossmap: "./tests/data/gossip-store-2024-09-22.compressed", I have seen that it is not very rare to encounter pathological cases in which
the default solver takes more than 30 seconds to compute a solution.

I am trying to stabilize the solver's runtime by first limiting U,
but the main idea if this PR was to try another alternative algorithm and see how they compare.

@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jun 9, 2025

I think it would be better to split this PR in two.
Removing the last two commits for a further PR in the future.

@Lagrang3 Lagrang3 closed this Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants