-
Notifications
You must be signed in to change notification settings - Fork 60
Binary GCD #755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Binary GCD #755
Conversation
* This reverts commit 0897439 * This adds further annotation
Uint::conditional_swap(&mut a, &mut b, do_swap); | ||
|
||
// subtract b from a when a is odd | ||
a = a.wrapping_sub(&Uint::select(&Uint::ZERO, &b, a_odd)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tarcieri what do you think of this line? Previously, it was like this:
a = Uint::select(&a, &a.wrapping_sub(&b), a_odd);
The current code is 25-10% faster for Uint
s with few limbs (1, 2, 3, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised there's that much of a difference. Are you sure it's always faster or is it faster depending on a
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty confident it is always faster.
The reason I think this is faster, is that we are now selecting between a constant and a variable, instead of two variables. Given that select
is const
and loves to be inlined, the compiler can now optimize the select
operation.
Recall, Uint::select
calls Limb::select
, which in turn calls
impl ConstChoice {
/// Return `b` if `self` is truthy, otherwise return `a`.
#[inline]
pub(crate) const fn select_word(&self, a: Word, b: Word) -> Word {
a ^ (self.0 & (a ^ b))
}
}
When a
is the constant ZERO, this can be optimized as:
self.0 & b
saving two XOR operations, or 2/3's of this operation.
Returning to the gcd
subroutine, this select
is in the hot loop of this algorithm. In total, the loop executes:
Uint::is_odd
(1 op)Uint::lt
(2 ops/word),ConstChoice::and
(1 op),Uint::wrapping_sub
(4 ops/word),Uint::select
(3 ops/word -> 1 ops/word)Uint::shr
(3 ops/word)
So, there is a reduction from 12 to 10 ops/word, or a 17% improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a is the constant ZERO
Aah, ok
@erik-3milabs I just set This means you can now make breaking changes, such as removing the existing safegcd implementation and changing trait impls like |
@tarcieri While I could still modify the Aside from that, what else would be required to see this PR merged? |
Aah, lack of invmod support would definitely be a problem. Is it something you plan on addressing eventually? My understanding is, like safegcd, that invmod is a big part of binary GCD's intended usage. It seems a little weird to have multiple implementations of GCD algorithms which effectively do the same thing, though to completely replace safegcd in addition to invmod you'd also need support for boxed types (though with Also seems it needs a rebase due to upstream changes. |
You're right. This PR only introduces the
I agree that having two algorithms is overkill. Let me see about implementing this for
Yeah, you're right. Let me address that right away. |
I'm incredibly interested in this PR as someone who:
I actually need an |
Whoops, fat fingered the close button trying to reply. While this is probably in OK shape as is I would preferably like to see it as the only GCD implementation, rather than having two. I think it’s fine to retain |
Heard, especially if the goal of modinv should be trivial as just the Thanks for the quick reply and for letting me ensure I have context! |
@erik-3milabs do you want to look into making this the primary/only GCD implementation? Otherwise I can potentially do that as a followup |
This PR introduces an implementation of Optimized Binary GCD. Ref: Pornin, Algorithm 2.
Upsides to this technique:
gcd
algorithm currently implemented incrypto_bigint
(see below) (really, it just has a different complexity bound).UNSAT_LIMBS
Benchmark results
Word = u64