-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New lint: [duplicate_map_keys
]
#12575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @llogiq (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
As outlined in #11978, the main goal now is to also detect if multiple inserts into a |
clippy_lints/src/hash_collision.rs
Outdated
/// When two items are inserted into a `HashMap` with the same key, | ||
/// the second item will overwrite the first item. | ||
/// | ||
/// ### Why is this bad? | ||
/// This can lead to data loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// When two items are inserted into a `HashMap` with the same key, | |
/// the second item will overwrite the first item. | |
/// | |
/// ### Why is this bad? | |
/// This can lead to data loss. | |
/// Warn when a `Hashmap` is set up with the | |
/// same key appearing twice. | |
/// | |
/// ### Why is this bad? | |
/// When two items are inserted into a `HashMap` with the same key, | |
/// the second item will overwrite the first item. | |
/// This can lead to data loss. |
clippy_lints/src/hash_collision.rs
Outdated
/// | ||
/// ### Why is this bad? | ||
/// This can lead to data loss. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// | |
/// | |
/// # Known Problems | |
/// False negatives: The lint only looks into | |
/// `HashMap::from([..])` calls. |
clippy_lints/src/hash_collision.rs
Outdated
// Then check the keys | ||
{ | ||
// Put all keys in a vector | ||
let mut literals = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use SpanlessEq
to correctly lint on arbitrary key expressions.
Also, why not return a (possibly empty) set of duplicate key spans instead of a plain bool? That way we could better pinpoint the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for point out SpanlessEq
. The function now returns Option<Vec<(Expr, Expr)>>
, so all pairs of found duplicates. I should probably someday change it to return all Expressions with their duplicates instead of split over multiple pairs.
tests/ui/hash_collision.rs
Outdated
use std::collections::HashMap; | ||
|
||
fn main() { | ||
let example = HashMap::from([(5, 1), (5, 2)]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see more tests. For example one with a ()
key, one with a key expr containing a return
statement, and one with a custom key type containing a bool where eq
always returns true and hash
is an empty function, then HashMap::from([(Bad(true), 1), (Bad(false)), 2)])
. That would constitute an (acceptable) false negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now written and tested all tests you recommended, but I didn't fully understand what you meant with "key expr containg a return
statement"
let _ = HashMap::from([(return Ok(()), 1), (return Err(()), 2)]); // expect no lint
is what I did, but everything should already be caught by clippy::diverging_sub_expression
, unless I missed or misunderstood something.
clippy_lints/src/hash_collision.rs
Outdated
/// let example = HashMap::from([(5, 1), (5, 2)]); | ||
/// ``` | ||
#[clippy::version = "1.79.0"] | ||
pub HASH_COLLISION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Hash collision" to me means different keys with the same hash, but this lint specifically looks for same keys.
(A hash collision also isn't bad in and of itself and won't lead to overwriting entries, a hashmap can deal with that)
Should this be renamed from hash_collision
to something else? What about duplicate_map_keys
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I should have caught that. duplicate_map_keys
is a good name, I've changed it in the latest commit.
hash_collision
]duplicate_map_keys
]
expr: &'a rustc_hir::Expr<'_>, | ||
) -> Option<Vec<(rustc_hir::Expr<'a>, rustc_hir::Expr<'a>)>> { | ||
// If the expression is a call to `HashMap::from`, check if the keys are the same | ||
if let ExprKind::Call(func, args) = &expr.kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let ExprKind::Call(func, args) = &expr.kind | |
if let ExprKind::Call(func, [arg]) = &expr.kind |
Matching a 1-element slice would let you avoid the args.len() == 1
check and index later.
let mut keys = Vec::new(); | ||
|
||
for arg in *args { | ||
// | ||
if let ExprKind::Tup(args) = &arg.kind | ||
&& !args.is_empty() | ||
// && let ExprKind::Lit(lit) = args[0].kind | ||
{ | ||
keys.push(args[0]); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut keys = Vec::new(); | |
for arg in *args { | |
// | |
if let ExprKind::Tup(args) = &arg.kind | |
&& !args.is_empty() | |
// && let ExprKind::Lit(lit) = args[0].kind | |
{ | |
keys.push(args[0]); | |
} | |
} | |
let keys = args.iter().filter_map(|arg| { | |
if let ExprKind::Tup([key, ..]) = arg.kind { | |
Some(key) | |
} else { | |
None | |
} | |
}).collect::<Vec<_>>(); |
for i in 0..keys.len() { | ||
for j in i + 1..keys.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quadratic algorithm that will slow down considerably with larger sets (try a thousand elements). A better way would be to use a SpanlessHash
per item and add them to a HashSet
, which is linear.
☔ The latest upstream changes (presumably #12107) made this pull request unmergeable. Please resolve the merge conflicts. |
Are you still interested in following through with this PR? Do you need anything from our side? |
Yes, I am interested, but I currently have little time. Uni takes more time than I thought. I hope to continue work within a month as then the hard part of the semester should be over... I don't currently need any help, thank you. |
In that case I'd prefer to close this for now so it doesn't clutter my queue. Please feel invited to reopen as soon as you find the time to continue working on it. |
Fixes #11978
changelog: new [
duplicate_map_keys
] lint