Add GetBlobRaw #83

cgwalters · 2025-05-13T14:29:52Z

This is needed for my "registry frontend for containers-storage" project, but should also be useful in general as we've discussed moving checksum verification into callers. Then there's no need for a driver etc.

Depends: containers/skopeo#2601

cgwalters · 2025-05-13T14:30:02Z

This is used by https://github.com/cgwalters/cstor-dist

allisonkarlitskaya

Some initial comments.

I have a bit of a hard time understanding everything that's going on here. It might benefit from some splitting up with descriptive commit messages for each step along the way...

src/imageproxy.rs

allisonkarlitskaya · 2025-05-29T21:24:42Z

I've gone and made a mess out of the commit ordering now, of course. We can sort that out tomorrow....

cgwalters

I pushed another commit on top of this that I think is closer to the best-of-both of what we were each trying to do. But I don't feel super strongly about it again, so feel free to push whatever else, or if you really want revert the top commit.

It was an interesting learning exercise at least.

src/imageproxy.rs

allisonkarlitskaya · 2025-05-30T07:38:39Z

I pushed another commit on top of this that I think is closer to the best-of-both of what we were each trying to do. But I don't feel super strongly about it again, so feel free to push whatever else, or if you really want revert the top commit.

It was an interesting learning exercise at least.

Ya! This was a fun one!

Your most recent commit resembles my first attempt. The reason I walked away from it was just a sort of "wow, this is a lot of code..." sort of feeling.

My approach is definitely getting close to the "magic" line, and maybe stepping over it a bit. Yours is very explicit, but you also pay a price for that in terms of the "lots of code" aspect:

    proxy: Reintroduce structures
 src/imageproxy.rs | 214 ++++++++++++++++++---------------
 1 file changed, 119 insertions(+), 95 deletions(-)

    imageproxy: make .impl_request() generic over fds
 src/imageproxy.rs | 272 +++++++++++++--------------------
 1 file changed, 109 insertions(+), 163 deletions(-)

I also think that we're not really losing any actual safety with my approach: the methods calling the .impl_request() function decide for themselves which number of fds are appropriate. We currently only have methods that take:

no fds
one fd and a pipe
two fds, no pipe

and that knowledge is encoded in the methods themselves. If we were to add a new sort of method in the future that wanted some other configuration of fds and pipe ids, we'd be able to do that without adding a new struct variant for that.

Your approach encodes the existing common patterns into structs that the methods can use. One of the strongest arguments for your approach, I think is that there are indeed some commonalities between APIs (the most established one being "read from this fd and then call FinishPipe") which can then be handled in a nicer way. In fact, it's even tempting to make this pattern a method on your FinishPipe struct, except that it also needs access to the socket to make the call...

I'm also pretty neutral on this. I prefer my approach (maybe with a few tweaks), but your most recent push is already worlds better than the first version of this patch and has an advantage or two over my approach (cleaner handling of the common "read+finish" case, less magic). I'd also be happy to go with it. Maybe we should let @jeckersb be the tie-breaker?

allisonkarlitskaya · 2025-05-30T07:52:19Z

fwiw, I'd make at least this change to my code:

@@ -347,9 +347,7 @@ fn fixed_from_iterable<T, const N: usize>(
     iterable: impl IntoIterator<IntoIter: FusedIterator, Item = T>,
 ) -> Result<[T; N]> {
     let mut iter = iterable.into_iter();
-    // We make use of the fact that [_; N].map() returns [_; N].  That makes this a bit more
-    // awkward than it would otherwise be, but it's not too bad.
-    let collected = [(); N].map(|_| iter.next());
+    let collected = std::array::from_fn(|_| iter.next());
     // Count the Some() in `collected` plus leftovers in the iter.
     let actual = collected.iter().flatten().count() + iter.count();
     if actual == N {

or maybe even rewrite it like this (at the cost of an allocation):

// Consumes an iterable and tries to convert it to a fixed-size array.  Returns Ok([T; N]) if the
// number of items in the iterable was correct, else an error describing the mismatch.
fn fixed_from_iterable<T, const N: usize>(
    name: &str,
    iterable: impl IntoIterator<IntoIter: FusedIterator, Item = T>,
) -> Result<[T; N]> {
    let vec = iterable.into_iter().collect::<Vec<T>>();
    vec.try_into().map_err(|vec: Vec<T>| {
        Error::Other(format!("Expected {N} {name} but got {}", vec.len()).into())
    })
}

in nightly we could do:

    let mut iter = iterable.into_iter();
    std::array::try_from_fn(|_| iter.next()).ok_or(Error::Other(
        format!("Expected {N} {name} but got fewer").into(),
    ))

with the loss of knowledge of the exact number of items that were present in the failing case (only important for the error message) and it would also require an explicit check on the iter being empty at the end...

really, we just need impl TryFrom<Iterator<T>> for [T; N] in the stdlib. Maybe some day :)

cgwalters · 2025-05-30T13:15:57Z

I asked Gemini and it pointed me in the direction of https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.collect_array which I think we should probably use instead of rolling our own (though I used collect_tuple() instead actually since I think it's clearer). I don't always remember to look at itertools, but there's some very useful things there. Specifically the FinishPipe case can just use exactly_one().

What do you think about that latest commit?

cgwalters · 2025-05-30T13:23:45Z

though I used collect_tuple() instead actually since I think it's clearer

Wait no it's not, for some reason I forgot it's as easy to destructure an array as a tuple. Back to using an array.

allisonkarlitskaya

I'm loving this iteration. We're approaching something really nice, in my opinion. In particular, I think the array destructuring approach makes your code a lot nicer.

I don't like taking depends, and itertools isn't in our transitive dependencies yet. We could probably just directly implement the one thing we need, but this thing has like 6000+ rdeps so it's probably okay...

The _nofds variant is nice as well, although I'd maybe flip the naming on its head: request_impl -> request_with_fds(), and the other one just .request()?

allisonkarlitskaya · 2025-05-30T17:34:25Z

For clarity (since I'm probably gonna disappear for the weekend now): I'm happy with the direction here and I'm happy to land whatever variant you guys manage to cook up. :)

This is needed for my "registry frontend for containers-storage" project, but should also be useful in general as we've discussed moving checksum verification into callers. Then there's no need for a driver etc. Assisted-by: Gemini Code Assist (for unit tests) Signed-off-by: Colin Walters <[email protected]> Signed-off-by: Allison Karlitskaya <[email protected]> Signed-off-by: Colin Walters <[email protected]>

These can be expressed with `impl Trait` argument syntax, making things a bit easier to read. Signed-off-by: Allison Karlitskaya <[email protected]> Signed-off-by: Colin Walters <[email protected]>

It's always nicer to directly hint the return type or let it be inferred by its surroundings. There's only one case where we need to be a bit more explicit about it, and even this explicitness is nicer. Signed-off-by: Allison Karlitskaya <[email protected]> Signed-off-by: Colin Walters <[email protected]>

...and the presence of a pipe ID. Create a new internal PipeId structure to keep track of pipe ids on the server. This is a very small wrapper around NonZeroU32. Have .impl_request() return a Result<> of a 3-tuple: - a deserialized result, as before - a fixed-sized array of fds ([OwnedFd; N]) - a fixed-sized array of PipeIds ([PipeId; M]) Introduce a bit of generics magic to make sure that we got the expected number of fds and pipeids from the other side, or an error. Then you can make calls like: - `let ((), [], []) = self.impl_request("CloseImage", [img.0]).await?;` - `let (digest, [datafd], [pipeid]) = self.impl_request("GetManifest", [img.0]).await?;` - `let (bloblen, [datafd, errfd], []) = self.impl_request("GetRawBlob", args).await?;` and you'll get an Ok() with the statically-correct number of fds and pipeids, or you'll get an Err() with a message why, like: Expected 1 OwnedFd but got 0 Of course, we can only ever have 0 or 1 PipeIds, but the fixed-sized array syntax has rather nice ergonomics here and its lets us share logic with the fds. Signed-off-by: Allison Karlitskaya <[email protected]> Signed-off-by: Colin Walters <[email protected]>

Instead of a generic array, use a trait on different structs. This ensures even more type safety for callers. Signed-off-by: Colin Walters <[email protected]>

Signed-off-by: Colin Walters <[email protected]>

Most requests don't have fds, so let's optimize it a bit and avoid the callers needing to have `()` spelled out. Signed-off-by: Colin Walters <[email protected]>

Copilot

Pull Request Overview

Adds a new get_raw_blob API to fetch blob data and its error stream without checksum verification, refactors request handling to generically support file-descriptor replies, and updates the example client to exercise the raw‐blob path.

Introduce FromReplyFds trait and two implementations (FinishPipe, DualFds) to parse returned FDs
Refactor impl_request* methods to return typed FDs and pipe IDs
Add ImageProxy::get_raw_blob and corresponding error type + tests
Update example CLI to support GetBlobRaw and the --raw-blobs flag
Bump itertools dependency

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
src/imageproxy.rs	Core FD‐aware request refactor, `get_raw_blob` API
examples/client.rs	New `GetBlobRaw` command, raw-blob demo implementation
Cargo.toml	Added `itertools` for array utilities

Comments suppressed due to low confidence (4)

src/imageproxy.rs:77

[nitpick] The Other variant’s display string is just "error", which is too generic—consider including the inner message with #[error("other: {0}")] or similar to surface the cause.

#[error("error")]

src/imageproxy.rs:688

[nitpick] There aren’t any unit or integration tests covering the successful end-to-end path of get_raw_blob; consider adding a mock-based test to verify you get back the correct length, data FD, and error behavior.

pub async fn get_raw_blob(

src/imageproxy.rs:364

The syntax impl IntoIterator<IntoIter: FusedIterator, Item = OwnedFd> is invalid; it should be impl IntoIterator<Item = OwnedFd, IntoIter: FusedIterator> to constrain the associated iterator type correctly.

fn from_reply(
        iterable: impl IntoIterator<IntoIter: FusedIterator, Item = OwnedFd>,

examples/client.rs:109

tokio::fs::File does not have read in scope by default; add use tokio::io::AsyncReadExt; at the top so read(&mut buffer) is available.

let n = datafd.read(&mut buffer).await?;

This was referenced May 13, 2025

Simplified fetching API #79

Open

Get dependency patches merged cgwalters/cstor-dist#2

Open

cgwalters force-pushed the get-raw-blob branch from f30bd93 to 890c452 Compare May 15, 2025 15:58

cgwalters mentioned this pull request May 15, 2025

proxy: Add GetRawBlob containers/skopeo#2601

Merged

cgwalters force-pushed the get-raw-blob branch from 890c452 to 58fedf9 Compare May 16, 2025 18:58

cgwalters marked this pull request as ready for review May 28, 2025 14:33

cgwalters requested review from allisonkarlitskaya and Copilot May 28, 2025 14:33

This comment was marked as outdated.

Sign in to view

cgwalters requested a review from jeckersb May 28, 2025 17:25

allisonkarlitskaya reviewed May 29, 2025

View reviewed changes

cgwalters force-pushed the get-raw-blob branch 2 times, most recently from 9ebf000 to a118c05 Compare May 29, 2025 14:19

allisonkarlitskaya force-pushed the get-raw-blob branch 2 times, most recently from 5e2deb5 to 89695c5 Compare May 29, 2025 21:24

cgwalters commented May 29, 2025

View reviewed changes

src/imageproxy.rs Outdated Show resolved Hide resolved

src/imageproxy.rs Outdated Show resolved Hide resolved

cgwalters commented May 29, 2025

View reviewed changes

src/imageproxy.rs Show resolved Hide resolved

cgwalters force-pushed the get-raw-blob branch 2 times, most recently from e5333c2 to 33c7dc1 Compare May 30, 2025 13:23

cgwalters force-pushed the get-raw-blob branch from 33c7dc1 to eb33264 Compare May 30, 2025 15:10

allisonkarlitskaya reviewed May 30, 2025

View reviewed changes

cgwalters and others added 2 commits May 30, 2025 14:38

imageproxy: drop generic parameters from impl_request()

a0ab1e1

These can be expressed with `impl Trait` argument syntax, making things a bit easier to read. Signed-off-by: Allison Karlitskaya <[email protected]> Signed-off-by: Colin Walters <[email protected]>

allisonkarlitskaya and others added 5 commits May 30, 2025 14:38

proxy: Reintroduce structures

4487174

Instead of a generic array, use a trait on different structs. This ensures even more type safety for callers. Signed-off-by: Colin Walters <[email protected]>

Use itertools to reduce manual iterator parsing

d5dc71a

Signed-off-by: Colin Walters <[email protected]>

Cleanly split fd-vs-nofds request APIs

6f6622b

Most requests don't have fds, so let's optimize it a bit and avoid the callers needing to have `()` spelled out. Signed-off-by: Colin Walters <[email protected]>

cgwalters force-pushed the get-raw-blob branch from bc006cd to 6f6622b Compare May 30, 2025 18:38

cgwalters requested a review from Copilot June 4, 2025 00:05

Copilot AI reviewed Jun 4, 2025

View reviewed changes

Add GetBlobRaw #83

Are you sure you want to change the base?

Add GetBlobRaw #83

Uh oh!

Conversation

cgwalters commented May 13, 2025

Uh oh!

cgwalters commented May 13, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

allisonkarlitskaya left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allisonkarlitskaya commented May 29, 2025

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allisonkarlitskaya commented May 30, 2025

Uh oh!

allisonkarlitskaya commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgwalters commented May 30, 2025

Uh oh!

cgwalters commented May 30, 2025

Uh oh!

allisonkarlitskaya left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allisonkarlitskaya commented May 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

allisonkarlitskaya commented May 30, 2025 •

edited

Loading

allisonkarlitskaya left a comment •

edited

Loading