Skip to content

Commit de67d62

Browse files
committed
Auto merge of #27474 - bluss:twoway-reverse, r=brson
StrSearcher: Implement the complete reverse case for the two way algorithm Fix quadratic behavior in StrSearcher in reverse search with periodic needles. This commit adds the missing pieces for the "short period" case in reverse search. The short case will show up when the needle is literally periodic, for example "abababab". Two way uses a "critical factorization" of the needle: x = u v. Searching matches v first, if mismatch at character k, skip k forward. Matching u, if mismatch, skip period(x) forward. To avoid O(mn) behavior after mismatch in u, memorize the already matched prefix. The short period case requires that |u| < period(x). For the reverse search we need to compute a different critical factorization x = u' v' where |v'| < period(x), because we are searching for the reversed needle. A short v' also benefits the algorithm in general. The reverse critical factorization is computed quickly by using the same maximal suffix algorithm, but terminating as soon as we have a location with local period equal to period(x). This adds extra fields crit_pos_back and memory_back for the reverse case. The new overhead for TwoWaySearcher::new is low, and additionally I think the "short period" case is uncommon in many applications of string search. The maximal_suffix methods were updated in documentation and the algorithms updated to not use !0 and wrapping add, variable left is now 1 larger, offset 1 smaller. Use periodicity when computing byteset: in the periodic case, just iterate over one period instead of the whole needle. Example before (rfind) after (twoway_rfind) benchmark shows the removal of quadratic behavior. needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100 ``` test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s ```
2 parents e35fd74 + 01e8812 commit de67d62

File tree

2 files changed

+213
-64
lines changed

2 files changed

+213
-64
lines changed

src/libcollectionstest/str.rs

+20
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,26 @@ fn test_find_str() {
8585
assert_eq!(data[43..86].find("ย中"), Some(67 - 43));
8686
assert_eq!(data[43..86].find("iệt"), Some(77 - 43));
8787
assert_eq!(data[43..86].find("Nam"), Some(83 - 43));
88+
89+
// find every substring -- assert that it finds it, or an earlier occurence.
90+
let string = "Việt Namacbaabcaabaaba";
91+
for (i, ci) in string.char_indices() {
92+
let ip = i + ci.len_utf8();
93+
for j in string[ip..].char_indices()
94+
.map(|(i, _)| i)
95+
.chain(Some(string.len() - ip))
96+
{
97+
let pat = &string[i..ip + j];
98+
assert!(match string.find(pat) {
99+
None => false,
100+
Some(x) => x <= i,
101+
});
102+
assert!(match string.rfind(pat) {
103+
None => false,
104+
Some(x) => x >= i,
105+
});
106+
}
107+
}
88108
}
89109

90110
fn s(x: &str) -> String { x.to_string() }

0 commit comments

Comments
 (0)