Skip to content

Binary GCD #755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 76 commits into
base: master
Choose a base branch
from
Open

Binary GCD #755

wants to merge 76 commits into from

Conversation

erik-3milabs
Copy link
Contributor

This PR introduces an implementation of Optimized Binary GCD. Ref: Pornin, Algorithm 2.

Upsides to this technique:

  • it is up to 27x faster than the gcd algorithm currently implemented in crypto_bigint (see below) (really, it just has a different complexity bound).
  • does not need UNSAT_LIMBS
  • it is actually constant time (unlike the current implementation, which is sneakily vartime in the maximum of the bitsizes of the two operands).

Benchmark results

Word = u64

limbs gcd (vt) gcd (ct) new_gcd (ct)
2 10.687 µs 20.619 µs 3.6090 µs
4 29.121 µs 56.433 µs 7.1124 µs
8 99.819 µs 195.02 µs 16.184 µs
16 359.39 µs 710.26 µs 44.294 µs
32 1.6804 ms 3.3097 ms 136.49 µs
64 6.9717 ms 13.028 ms 494.16 µs
128 29.099 ms 57.325 ms 2.3335 ms
256 143.22 ms 244.89 ms 8.7722 ms

Uint::conditional_swap(&mut a, &mut b, do_swap);

// subtract b from a when a is odd
a = a.wrapping_sub(&Uint::select(&Uint::ZERO, &b, a_odd));
Copy link
Contributor Author

@erik-3milabs erik-3milabs Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarcieri what do you think of this line? Previously, it was like this:

a = Uint::select(&a, &a.wrapping_sub(&b), a_odd);

The current code is 25-10% faster for Uints with few limbs (1, 2, 3, etc.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised there's that much of a difference. Are you sure it's always faster or is it faster depending on a?

Copy link
Contributor Author

@erik-3milabs erik-3milabs Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty confident it is always faster.

The reason I think this is faster, is that we are now selecting between a constant and a variable, instead of two variables. Given that select is const and loves to be inlined, the compiler can now optimize the select operation.

Recall, Uint::select calls Limb::select, which in turn calls

impl ConstChoice {
    /// Return `b` if `self` is truthy, otherwise return `a`.
    #[inline]
    pub(crate) const fn select_word(&self, a: Word, b: Word) -> Word {
        a ^ (self.0 & (a ^ b))
    }
}

When a is the constant ZERO, this can be optimized as:

        self.0 & b

saving two XOR operations, or 2/3's of this operation.

Returning to the gcd subroutine, this select is in the hot loop of this algorithm. In total, the loop executes:

  • Uint::is_odd (1 op)
  • Uint::lt (2 ops/word),
  • ConstChoice::and (1 op),
  • Uint::wrapping_sub (4 ops/word),
  • Uint::select (3 ops/word -> 1 ops/word)
  • Uint::shr (3 ops/word)

So, there is a reduction from 12 to 10 ops/word, or a 17% improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a is the constant ZERO

Aah, ok

@erik-3milabs erik-3milabs mentioned this pull request Feb 14, 2025
@tarcieri
Copy link
Member

@erik-3milabs I just set master to v0.7.0-pre in #765

This means you can now make breaking changes, such as removing the existing safegcd implementation and changing trait impls like Gcd and InvMod to use bingcd instead

@erik-3milabs
Copy link
Contributor Author

@erik-3milabs I just set master to v0.7.0-pre in #765

This means you can now make breaking changes, such as removing the existing safegcd implementation and changing trait impls like Gcd and InvMod to use bingcd instead

@tarcieri While I could still modify the Gcd trait to use this algorithm, this PR does not yet introduce the tools necessary to replace InvMod.

Aside from that, what else would be required to see this PR merged?

@tarcieri
Copy link
Member

tarcieri commented Mar 11, 2025

Aah, lack of invmod support would definitely be a problem. Is it something you plan on addressing eventually? My understanding is, like safegcd, that invmod is a big part of binary GCD's intended usage.

It seems a little weird to have multiple implementations of GCD algorithms which effectively do the same thing, though to completely replace safegcd in addition to invmod you'd also need support for boxed types (though with const_mut_refs stable it should be a lot easier to share an implementation).

Also seems it needs a rebase due to upstream changes.

@erik-3milabs
Copy link
Contributor Author

Aah, lack of invmod support would definitely be a problem. Is it something you plan on addressing eventually? My understanding is, like safegcd, that invmod is a big part of binary GCD's intended usage.

You're right. This PR only introduces the gcd algorithm; PR #761 extends the algorithm into xgcd. Stripping some things from the xgcd algorithm gives invmod. Given that I don't need invmod myself, I am not too keen on implementing it 🙈

It seems a little weird to have multiple implementations of GCD algorithms which effectively do the same thing, though to completely replace safegcd in addition to invmod you'd also need support for boxed types (though with const_mut_refs stable it should be a lot easier to share an implementation).

I agree that having two algorithms is overkill. Let me see about implementing this for Boxed<X> as well.

Also seems it needs a rebase due to upstream changes.

Yeah, you're right. Let me address that right away.

@kayabaNerve
Copy link
Contributor

kayabaNerve commented Apr 2, 2025

I'm incredibly interested in this PR as someone who:

  1. Needs a constant-time GCD
  2. Has >50% of my runtime spent on safegcd::divsteps alone right now

I actually need an xgcd (which #761 solves), it's just trivial to inefficiently implement xgcd given gcd (by manually calculating ((a / g) % (b / g))**-1 for the first coefficient and solving from there for the second). I'd love to see this merged ASAP accordingly and want to ask if anything is a current blocker I can help with.

@tarcieri tarcieri closed this Apr 2, 2025
@tarcieri tarcieri reopened this Apr 2, 2025
@tarcieri
Copy link
Member

tarcieri commented Apr 2, 2025

Whoops, fat fingered the close button trying to reply.

While this is probably in OK shape as is I would preferably like to see it as the only GCD implementation, rather than having two.

I think it’s fine to retain safegcd for inversions until this can be extended to support inversions as well, but I’d really prefer for there to be one algorithm used for GCD for both boxed and unboxed types.

@kayabaNerve
Copy link
Contributor

Heard, especially if the goal of 0.7.0 is a long-lived release without continued maintenance of safegcd.

modinv should be trivial as just the x coefficient from xgcd, Some if gcd == 1, None otherwise, right?

Thanks for the quick reply and for letting me ensure I have context!

@tarcieri
Copy link
Member

tarcieri commented Apr 7, 2025

@erik-3milabs do you want to look into making this the primary/only GCD implementation?

Otherwise I can potentially do that as a followup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants