Skip to content

byte_pattern: share the TwoWaySearcher between byte and str #135931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

folkertdev
Copy link
Contributor

tracking issue: #134149

An attempt to break up #134350 into more manageable pieces.

From what I can see, the TwoWaySearcher implementation does not have special logic for UTF8 boundaries, so it should work just as well on any &[u8]. So this PR just moves the TwoWaySearcher implementation to slice/byte_pattern.rs, and then uses it from str/pattern.rs. No functional changes, no additional API surface.

r? @BurntSushi

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 23, 2025
Comment on lines -1152 to +1164
SearchStep::Reject(a, mut b) => {
byte_pattern::SearchStep::Reject(a, mut b) => {
// skip to next char boundary
while !self.haystack.is_char_boundary(b) {
b += 1;
}
searcher.position = cmp::max(b, searcher.position);
SearchStep::Reject(a, b)
}
otherwise => otherwise,
byte_pattern::SearchStep::Match(a, b) => SearchStep::Match(a, b),
byte_pattern::SearchStep::Done => SearchStep::Done,
Copy link
Contributor Author

@folkertdev folkertdev Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I duplicated SearchStep because it is a public type and its documentation refers to the Searcher trait. The byte_pattern module will have it's own Searcher trait (or ByteSearcher maybe) and so that documentation would be misleading one way or the other.

Comment on lines 1000 to 1001
if let Some(result) = simd_contains(self, haystack) {
if let Some(result) = simd_contains(self.as_bytes(), haystack.as_bytes()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function just takes a slice of bytes now. From what I can see the implementation does not rely on the input being UTF8 at all.

@rust-log-analyzer

This comment has been minimized.

@folkertdev folkertdev force-pushed the byte-pattern-two-way-searcher branch from ea951c7 to 0b23d41 Compare January 23, 2025 12:54
@rust-log-analyzer

This comment has been minimized.

@folkertdev folkertdev force-pushed the byte-pattern-two-way-searcher branch from 0b23d41 to f3cb4ca Compare January 23, 2025 13:58
@rust-log-analyzer

This comment has been minimized.

@folkertdev folkertdev force-pushed the byte-pattern-two-way-searcher branch from f3cb4ca to c631191 Compare January 23, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants