Skip to content

New lint: [duplicate_map_keys] #12575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5290,6 +5290,7 @@ Released 2018-09-13
[`get_first`]: https://rust-lang.github.io/rust-clippy/master/index.html#get_first
[`get_last_with_len`]: https://rust-lang.github.io/rust-clippy/master/index.html#get_last_with_len
[`get_unwrap`]: https://rust-lang.github.io/rust-clippy/master/index.html#get_unwrap
[`hash_collision`]: https://rust-lang.github.io/rust-clippy/master/index.html#hash_collision
[`host_endian_bytes`]: https://rust-lang.github.io/rust-clippy/master/index.html#host_endian_bytes
[`identity_conversion`]: https://rust-lang.github.io/rust-clippy/master/index.html#identity_conversion
[`identity_op`]: https://rust-lang.github.io/rust-clippy/master/index.html#identity_op
Expand Down
1 change: 1 addition & 0 deletions clippy_lints/src/declared_lints.rs
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ pub(crate) static LINTS: &[&crate::LintInfo] = &[
crate::functions::TOO_MANY_ARGUMENTS_INFO,
crate::functions::TOO_MANY_LINES_INFO,
crate::future_not_send::FUTURE_NOT_SEND_INFO,
crate::hash_collision::HASH_COLLISION_INFO,
crate::if_let_mutex::IF_LET_MUTEX_INFO,
crate::if_not_else::IF_NOT_ELSE_INFO,
crate::if_then_some_else_none::IF_THEN_SOME_ELSE_NONE_INFO,
Expand Down
91 changes: 91 additions & 0 deletions clippy_lints/src/hash_collision.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
use clippy_utils::diagnostics::span_lint_and_note;
use clippy_utils::last_path_segment;
use clippy_utils::ty::is_type_diagnostic_item;
use rustc_hir::ExprKind;
use rustc_lint::{LateContext, LateLintPass};
use rustc_session::declare_lint_pass;
use rustc_span::symbol::sym;

declare_clippy_lint! {
/// ### What it does
/// When two items are inserted into a `HashMap` with the same key,
/// the second item will overwrite the first item.
///
/// ### Why is this bad?
/// This can lead to data loss.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// When two items are inserted into a `HashMap` with the same key,
/// the second item will overwrite the first item.
///
/// ### Why is this bad?
/// This can lead to data loss.
/// Warn when a `Hashmap` is set up with the
/// same key appearing twice.
///
/// ### Why is this bad?
/// When two items are inserted into a `HashMap` with the same key,
/// the second item will overwrite the first item.
/// This can lead to data loss.

///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// # Known Problems
/// False negatives: The lint only looks into
/// `HashMap::from([..])` calls.

/// ### Example
/// ```no_run
/// # use std::collections::HashMap;
/// let example = HashMap::from([(5, 1), (5, 2)]);
/// ```
#[clippy::version = "1.79.0"]
pub HASH_COLLISION,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Hash collision" to me means different keys with the same hash, but this lint specifically looks for same keys.
(A hash collision also isn't bad in and of itself and won't lead to overwriting entries, a hashmap can deal with that)

Should this be renamed from hash_collision to something else? What about duplicate_map_keys?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I should have caught that. duplicate_map_keys is a good name, I've changed it in the latest commit.

suspicious,
"`HashMap` with two identical keys loses data"
}

declare_lint_pass!(HashCollision => [HASH_COLLISION]);

impl<'tcx> LateLintPass<'tcx> for HashCollision {
fn check_expr(&mut self, cx: &LateContext<'tcx>, expr: &'tcx rustc_hir::Expr<'tcx>) {
if has_hash_collision(cx, expr) {
span_lint_and_note(
cx,
HASH_COLLISION,
expr.span,
"this `HashMap` has a hash collision and will lose data",
None,
"consider using a different keys for all items",
);
}
}
}

// TODO: Also check for different sources of hash collisions
// TODO: Maybe other types of hash maps should be checked as well?
fn has_hash_collision(cx: &LateContext<'_>, expr: &rustc_hir::Expr<'_>) -> bool {
// If the expression is a call to `HashMap::from`, check if the keys are the same
if let ExprKind::Call(func, args) = &expr.kind
// First check for HashMap::from
&& let ty = cx.typeck_results().expr_ty(expr)
&& is_type_diagnostic_item(cx, ty, sym::HashMap)
&& let ExprKind::Path(func_path) = func.kind
&& last_path_segment(&func_path).ident.name == sym::from
// There should be exactly one argument to HashMap::from
&& args.len() == 1
// which should be an array
&& let ExprKind::Array(args) = &args[0].kind
// Then check the keys
{
// Put all keys in a vector
let mut literals = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use SpanlessEq to correctly lint on arbitrary key expressions.

Also, why not return a (possibly empty) set of duplicate key spans instead of a plain bool? That way we could better pinpoint the error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for point out SpanlessEq. The function now returns Option<Vec<(Expr, Expr)>>, so all pairs of found duplicates. I should probably someday change it to return all Expressions with their duplicates instead of split over multiple pairs.


for arg in *args {
//
if let ExprKind::Tup(args) = &arg.kind
&& !args.is_empty()
&& let ExprKind::Lit(lit) = args[0].kind
{
literals.push(lit.node.clone());
}
}
// Debug: it gets here, but doesn't trigger the lint

// Check if there are any duplicate keys
let mut duplicate = false;
for i in 0..literals.len() {
for j in i + 1..literals.len() {
if literals[i] == literals[j] {
duplicate = true;
break;
}
}
if duplicate {
break;
}
}
return duplicate;
}
false
}
2 changes: 2 additions & 0 deletions clippy_lints/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ mod from_raw_with_void_ptr;
mod from_str_radix_10;
mod functions;
mod future_not_send;
mod hash_collision;
mod if_let_mutex;
mod if_not_else;
mod if_then_some_else_none;
Expand Down Expand Up @@ -1129,6 +1130,7 @@ pub fn register_lints(store: &mut rustc_lint::LintStore, conf: &'static Conf) {
store.register_late_pass(|_| Box::new(zero_repeat_side_effects::ZeroRepeatSideEffects));
store.register_late_pass(|_| Box::new(manual_unwrap_or_default::ManualUnwrapOrDefault));
store.register_late_pass(|_| Box::new(integer_division_remainder_used::IntegerDivisionRemainderUsed));
store.register_late_pass(|_| Box::new(hash_collision::HashCollision));
// add lints here, do not remove this comment, it's used in `new_lint`
}

Expand Down
6 changes: 6 additions & 0 deletions tests/ui/hash_collision.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#![warn(clippy::hash_collision)]
use std::collections::HashMap;

fn main() {
let example = HashMap::from([(5, 1), (5, 2)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more tests. For example one with a () key, one with a key expr containing a return statement, and one with a custom key type containing a bool where eq always returns true and hash is an empty function, then HashMap::from([(Bad(true), 1), (Bad(false)), 2)]). That would constitute an (acceptable) false negative.

Copy link
Author

@SeseMueller SeseMueller Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now written and tested all tests you recommended, but I didn't fully understand what you meant with "key expr containg a return statement"

    let _ = HashMap::from([(return Ok(()), 1), (return Err(()), 2)]); // expect no lint 

is what I did, but everything should already be caught by clippy::diverging_sub_expression, unless I missed or misunderstood something.

}
12 changes: 12 additions & 0 deletions tests/ui/hash_collision.stderr
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
error: this `HashMap` has a hash collision and will lose data
--> tests/ui/hash_collision.rs:5:19
|
LL | let example = HashMap::from([(5, 1), (5, 2)]);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: consider using a different keys for all items
= note: `-D clippy::hash-collision` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::hash_collision)]`

error: aborting due to 1 previous error