Skip to content

Add GetBlobRaw #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add GetBlobRaw #83

wants to merge 7 commits into from

Conversation

cgwalters
Copy link
Collaborator

This is needed for my "registry frontend for containers-storage" project, but should also be useful in general as we've discussed moving checksum verification into callers. Then there's no need for a driver etc.

Depends: containers/skopeo#2601

@cgwalters
Copy link
Collaborator Author

This is used by https://github.com/cgwalters/cstor-dist

Copilot

This comment was marked as outdated.

@cgwalters cgwalters requested a review from jeckersb May 28, 2025 17:25
Copy link
Contributor

@allisonkarlitskaya allisonkarlitskaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

I have a bit of a hard time understanding everything that's going on here. It might benefit from some splitting up with descriptive commit messages for each step along the way...

@cgwalters cgwalters force-pushed the get-raw-blob branch 2 times, most recently from 9ebf000 to a118c05 Compare May 29, 2025 14:19
@allisonkarlitskaya allisonkarlitskaya force-pushed the get-raw-blob branch 2 times, most recently from 5e2deb5 to 89695c5 Compare May 29, 2025 21:24
@allisonkarlitskaya
Copy link
Contributor

I've gone and made a mess out of the commit ordering now, of course. We can sort that out tomorrow....

Copy link
Collaborator Author

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed another commit on top of this that I think is closer to the best-of-both of what we were each trying to do. But I don't feel super strongly about it again, so feel free to push whatever else, or if you really want revert the top commit.

It was an interesting learning exercise at least.

@allisonkarlitskaya
Copy link
Contributor

I pushed another commit on top of this that I think is closer to the best-of-both of what we were each trying to do. But I don't feel super strongly about it again, so feel free to push whatever else, or if you really want revert the top commit.

It was an interesting learning exercise at least.

Ya! This was a fun one!

Your most recent commit resembles my first attempt. The reason I walked away from it was just a sort of "wow, this is a lot of code..." sort of feeling.

My approach is definitely getting close to the "magic" line, and maybe stepping over it a bit. Yours is very explicit, but you also pay a price for that in terms of the "lots of code" aspect:

    proxy: Reintroduce structures
 src/imageproxy.rs | 214 ++++++++++++++++++---------------
 1 file changed, 119 insertions(+), 95 deletions(-)

    imageproxy: make .impl_request() generic over fds
 src/imageproxy.rs | 272 +++++++++++++--------------------
 1 file changed, 109 insertions(+), 163 deletions(-)

I also think that we're not really losing any actual safety with my approach: the methods calling the .impl_request() function decide for themselves which number of fds are appropriate. We currently only have methods that take:

  • no fds
  • one fd and a pipe
  • two fds, no pipe

and that knowledge is encoded in the methods themselves. If we were to add a new sort of method in the future that wanted some other configuration of fds and pipe ids, we'd be able to do that without adding a new struct variant for that.

Your approach encodes the existing common patterns into structs that the methods can use. One of the strongest arguments for your approach, I think is that there are indeed some commonalities between APIs (the most established one being "read from this fd and then call FinishPipe") which can then be handled in a nicer way. In fact, it's even tempting to make this pattern a method on your FinishPipe struct, except that it also needs access to the socket to make the call...

I'm also pretty neutral on this. I prefer my approach (maybe with a few tweaks), but your most recent push is already worlds better than the first version of this patch and has an advantage or two over my approach (cleaner handling of the common "read+finish" case, less magic). I'd also be happy to go with it. Maybe we should let @jeckersb be the tie-breaker?

@allisonkarlitskaya
Copy link
Contributor

allisonkarlitskaya commented May 30, 2025

fwiw, I'd make at least this change to my code:

@@ -347,9 +347,7 @@ fn fixed_from_iterable<T, const N: usize>(
     iterable: impl IntoIterator<IntoIter: FusedIterator, Item = T>,
 ) -> Result<[T; N]> {
     let mut iter = iterable.into_iter();
-    // We make use of the fact that [_; N].map() returns [_; N].  That makes this a bit more
-    // awkward than it would otherwise be, but it's not too bad.
-    let collected = [(); N].map(|_| iter.next());
+    let collected = std::array::from_fn(|_| iter.next());
     // Count the Some() in `collected` plus leftovers in the iter.
     let actual = collected.iter().flatten().count() + iter.count();
     if actual == N {

or maybe even rewrite it like this (at the cost of an allocation):

// Consumes an iterable and tries to convert it to a fixed-size array.  Returns Ok([T; N]) if the
// number of items in the iterable was correct, else an error describing the mismatch.
fn fixed_from_iterable<T, const N: usize>(
    name: &str,
    iterable: impl IntoIterator<IntoIter: FusedIterator, Item = T>,
) -> Result<[T; N]> {
    let vec = iterable.into_iter().collect::<Vec<T>>();
    vec.try_into().map_err(|vec: Vec<T>| {
        Error::Other(format!("Expected {N} {name} but got {}", vec.len()).into())
    })
}

in nightly we could do:

    let mut iter = iterable.into_iter();
    std::array::try_from_fn(|_| iter.next()).ok_or(Error::Other(
        format!("Expected {N} {name} but got fewer").into(),
    ))

with the loss of knowledge of the exact number of items that were present in the failing case (only important for the error message) and it would also require an explicit check on the iter being empty at the end...

really, we just need impl TryFrom<Iterator<T>> for [T; N] in the stdlib. Maybe some day :)

@cgwalters
Copy link
Collaborator Author

I asked Gemini and it pointed me in the direction of https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.collect_array which I think we should probably use instead of rolling our own (though I used collect_tuple() instead actually since I think it's clearer). I don't always remember to look at itertools, but there's some very useful things there. Specifically the FinishPipe case can just use exactly_one().

What do you think about that latest commit?

@cgwalters cgwalters force-pushed the get-raw-blob branch 2 times, most recently from e5333c2 to 33c7dc1 Compare May 30, 2025 13:23
@cgwalters
Copy link
Collaborator Author

though I used collect_tuple() instead actually since I think it's clearer

Wait no it's not, for some reason I forgot it's as easy to destructure an array as a tuple. Back to using an array.

Copy link
Contributor

@allisonkarlitskaya allisonkarlitskaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm loving this iteration. We're approaching something really nice, in my opinion. In particular, I think the array destructuring approach makes your code a lot nicer.

I don't like taking depends, and itertools isn't in our transitive dependencies yet. We could probably just directly implement the one thing we need, but this thing has like 6000+ rdeps so it's probably okay...

The _nofds variant is nice as well, although I'd maybe flip the naming on its head: request_impl -> request_with_fds(), and the other one just .request()?

@allisonkarlitskaya
Copy link
Contributor

For clarity (since I'm probably gonna disappear for the weekend now): I'm happy with the direction here and I'm happy to land whatever variant you guys manage to cook up. :)

cgwalters and others added 2 commits May 30, 2025 14:38
This is needed for my "registry frontend for containers-storage"
project, but should also be useful in general as we've discussed
moving checksum verification into callers. Then there's no
need for a driver etc.

Assisted-by: Gemini Code Assist (for unit tests)
Signed-off-by: Colin Walters <[email protected]>
Signed-off-by: Allison Karlitskaya <[email protected]>
Signed-off-by: Colin Walters <[email protected]>
These can be expressed with `impl Trait` argument syntax, making things
a bit easier to read.

Signed-off-by: Allison Karlitskaya <[email protected]>
Signed-off-by: Colin Walters <[email protected]>
allisonkarlitskaya and others added 5 commits May 30, 2025 14:38
It's always nicer to directly hint the return type or let it be inferred
by its surroundings.  There's only one case where we need to be a bit
more explicit about it, and even this explicitness is nicer.

Signed-off-by: Allison Karlitskaya <[email protected]>
Signed-off-by: Colin Walters <[email protected]>
...and the presence of a pipe ID.

Create a new internal PipeId structure to keep track of pipe ids on the
server.  This is a very small wrapper around NonZeroU32.

Have .impl_request() return a Result<> of a 3-tuple:
 - a deserialized result, as before
 - a fixed-sized array of fds ([OwnedFd; N])
 - a fixed-sized array of PipeIds ([PipeId; M])

Introduce a bit of generics magic to make sure that we got the expected
number of fds and pipeids from the other side, or an error.

Then you can make calls like:

 - `let ((), [], []) = self.impl_request("CloseImage", [img.0]).await?;`
 - `let (digest, [datafd], [pipeid]) = self.impl_request("GetManifest", [img.0]).await?;`
 - `let (bloblen, [datafd, errfd], []) = self.impl_request("GetRawBlob", args).await?;`

and you'll get an Ok() with the statically-correct number of fds and
pipeids, or you'll get an Err() with a message why, like:

  Expected 1 OwnedFd but got 0

Of course, we can only ever have 0 or 1 PipeIds, but the fixed-sized
array syntax has rather nice ergonomics here and its lets us share logic
with the fds.

Signed-off-by: Allison Karlitskaya <[email protected]>
Signed-off-by: Colin Walters <[email protected]>
Instead of a generic array, use a trait on different structs.
This ensures even more type safety for callers.

Signed-off-by: Colin Walters <[email protected]>
Most requests don't have fds, so let's optimize it a bit and
avoid the callers needing to have `()` spelled out.

Signed-off-by: Colin Walters <[email protected]>
@cgwalters cgwalters requested a review from Copilot June 4, 2025 00:05
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new get_raw_blob API to fetch blob data and its error stream without checksum verification, refactors request handling to generically support file-descriptor replies, and updates the example client to exercise the raw‐blob path.

  • Introduce FromReplyFds trait and two implementations (FinishPipe, DualFds) to parse returned FDs
  • Refactor impl_request* methods to return typed FDs and pipe IDs
  • Add ImageProxy::get_raw_blob and corresponding error type + tests
  • Update example CLI to support GetBlobRaw and the --raw-blobs flag
  • Bump itertools dependency

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/imageproxy.rs Core FD‐aware request refactor, get_raw_blob API
examples/client.rs New GetBlobRaw command, raw-blob demo implementation
Cargo.toml Added itertools for array utilities
Comments suppressed due to low confidence (4)

src/imageproxy.rs:77

  • [nitpick] The Other variant’s display string is just "error", which is too generic—consider including the inner message with #[error("other: {0}")] or similar to surface the cause.
#[error("error")]

src/imageproxy.rs:688

  • [nitpick] There aren’t any unit or integration tests covering the successful end-to-end path of get_raw_blob; consider adding a mock-based test to verify you get back the correct length, data FD, and error behavior.
pub async fn get_raw_blob(

src/imageproxy.rs:364

  • The syntax impl IntoIterator<IntoIter: FusedIterator, Item = OwnedFd> is invalid; it should be impl IntoIterator<Item = OwnedFd, IntoIter: FusedIterator> to constrain the associated iterator type correctly.
fn from_reply(
        iterable: impl IntoIterator<IntoIter: FusedIterator, Item = OwnedFd>,

examples/client.rs:109

  • tokio::fs::File does not have read in scope by default; add use tokio::io::AsyncReadExt; at the top so read(&mut buffer) is available.
let n = datafd.read(&mut buffer).await?;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants