Skip to content

Make fetching source tarball by git commit more robust #4680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Oct 11, 2024

(created using eb --new-pr)

I had a crate that was supposed to fetch a git commit from https://github.com/astral-sh/reqwest-middleware. However the specified git commit 21ceec9a5fd2e8d6f71c3ea2999078fecbd13cbe was not available for git checkout after the clone.

My understanding is, that this commit is not reachable from the default (main) branch that is fetched, possibly after same rebase.

The approach done here is to add a separate git fetch call to fetch the specified commit. That resolves the issue in my case.

We could go even further and don't use git clone at all:

mkdir repo
cd repo
git init
git fetch --depth=1 https://github.com/... sha/branch/tag [--recursive]
git checkout sha/branch/tag

@boegel boegel added the bug fix label Oct 12, 2024
@Micket
Copy link
Contributor

Micket commented Nov 7, 2024

I also think this checkout only works until github garbage collects. This dangling commit it just held on by reflog, and will disappear with a git gc. So, 90 days or so and it's gone i think.

I am in favor of just fetching what we need for the requested checkout, but it wouldn't actually solve the underlying issue; UV devs are throwing away their old patches. I think the devs here should tag/branch their commit and not leave it detached, and the old method would still work (I think). So the cost here would be that we wouldn't detect this issue until the source is truly forever gone.

Note: I think you meant reqwest-middleware as that's the onethat has the commit you mention here:
astral-sh/reqwest-middleware@21ceec9

(edit: I could be wrong, maybe github never GC's? But i doubt that)

@Flamefire
Copy link
Contributor Author

I also think this checkout only works until github garbage collects. This dangling commit it just held on by reflog, and will disappear with a git gc. So, 90 days or so and it's gone i think.

Not sure: I locally created a commit, amended and ran git gc --aggressive --force --prune=now but was still able to access the original commit. So maybe it does work even after any GC/Prune?

Note: I think you meant reqwest-middleware as that's the onethat has the commit you mention here: astral-sh/reqwest-middleware@21ceec9

Yes, mis-pasted. Fixed in the description

@Micket
Copy link
Contributor

Micket commented Nov 8, 2024

I think it does go out, unless github sets up their reflog expiry date to something infinite.
https://git-scm.com/docs/git-gc#Documentation/git-gc.txt-gcreflogExpire

Testing it out:

mkdir test-gc
cd test-gc
git init
echo "First version" > file.txt
git add file.txt
git commit -m "First version"
echo  "Modified version" > file.txt
git add file.txt
git commit --amend -m "Modified version"
git reflog   # both show up and i can checkout either

git reflog expire --expire=now --all  # default is 90 days but i don't want to wait
git gc
git switch ca00a7a  # now gives me "fatal: invalid reference: ca00a7a"

Just gc isn't enough, the reglof need to expire before that's cleaned up.

@Flamefire
Copy link
Contributor Author

Ah ok, so the reflog needs to be GCed.

I guess in that case we cannot do anything. At least the change makes the checkouts faster and gives us 90 days more time.

@Micket
Copy link
Contributor

Micket commented Nov 8, 2024

Yes the UV devs should at least just add a git tag "reqwests-uv-2.x" on their patch commit, that would prevent it from expiring.

@Flamefire
Copy link
Contributor Author

It isn't actually uv but a dependency of it. And that repo doesn't has an issue tracker so I could complain. Also the commit is currently only in a branch, not somewhere in main, i.e. not merged yet. So can't help that for now.

@Micket
Copy link
Contributor

Micket commented Nov 8, 2024

But it's the UV devs patching their dependencies is it not? (on several repos under https://github.com/astral-sh)

@Flamefire
Copy link
Contributor Author

Ah it's the same GitHub org, haven't seen that. I opened an issue in the uv repo

@Flamefire Flamefire force-pushed the 20241011131511_new_pr_TIItVuOUVw branch from 2d2e09d to 18600e0 Compare March 20, 2025 13:20
A `git fetch <repo> <hash>` requires a full hash and fails with a short hash with
> could not find remote ref

Disable the depth=1 optimization and remove the fetch call for those cases.
@Flamefire Flamefire force-pushed the 20241011131511_new_pr_TIItVuOUVw branch from 18600e0 to caf4537 Compare March 20, 2025 16:54
@boegel boegel added this to the 5.x milestone Apr 9, 2025
@boegel
Copy link
Member

boegel commented Apr 9, 2025

@lexming You know this part of the code well, can you take a look at this one?

@Micket
Copy link
Contributor

Micket commented Apr 9, 2025

My understanding to the suggested changes:

  1. Upside; we can just fetch what we need. Slightly smaller downloads and tarballs i suppose.
  2. Downside; we might miss that these might be commits that are orphaned and will be permanently removed from the internet once github's GC runs. This was the case for the commits that triggered this very PR that are/were only available because they live in reflog.

@Flamefire
Copy link
Contributor Author

Flamefire commented Apr 9, 2025

My understanding to the suggested changes:

1. Upside; we can just fetch what we need. Slightly smaller downloads and tarballs i suppose.

2. Downside; we might miss that these might be commits that are orphaned and will be permanently removed from the internet once github's GC runs. This was the case for the commits that triggered this very PR that are/were only available because they live in reflog.

That is possible. I wasn't really sure why that commit was missing. Especially for the rust crates this is going to be an issue I guess: It really asks for a specific commit and if that is permanently gone we can't do anything I guess. Probably patching the commit if we still know a similar enough commit.
In this case there is a tag attached to that commit though. So I guess it won't expire.

@Flamefire
Copy link
Contributor Author

I double checked: That tag was created on my request and didn't exist back then which is why it failed.

So we have 2 cases:

  • Commit is "loose" at the time of writing the EC
  • Commit becomes "loose" e.g. after a rebase, which is likely what happened here

We want to detect the first, i.e. not do the explicit fetch. And I think it is worth to mitigate the 2nd. How about guarding the explicit fetch behind a command line option like --fetch-loose-git-commits, purposely long and ugly and with a verbose warning when used. The (logic) change in the code is literally 2 lines + the option handling if we want that, so cheap in terms of effort.

BTW; We have a bug when neither a commit nor a tag is given: We'd do git checkout which IMO errors.

So my proposal:

  • check CLI option and issue verbose warning if set
  • clone as before
  • try the checkout
  • if checkout fails and a long commit is given and the CLI option is set try the fetch.
  • show a verbose warning with text depending on the success or failure of that fetch and with a hint to open an issue in the git repository about it, we already have the URL for that, so lo effort for users
  • retry the checkout on success of the fetch

@lexming lexming self-assigned this Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants