Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build cache ERROR: failed to solve: Internal: not found #5784

Open
4 tasks done
bhperry opened this issue Feb 26, 2025 · 19 comments
Open
4 tasks done

Build cache ERROR: failed to solve: Internal: not found #5784

bhperry opened this issue Feb 26, 2025 · 19 comments

Comments

@bhperry
Copy link

bhperry commented Feb 26, 2025

Contributing guidelines and issue reporting guide

Well-formed report checklist

  • I have found a bug that the documentation does not mention anything about my problem
  • I have found a bug that there are no open or closed issues that are related to my problem
  • I have provided version/information about my environment and done my best to provide a reproducer

Description of bug

Bug description

Lately I have started to see docker builds in Github Actions that use the GHA build cache fail regularly when exporting the cache at the end of a build. Re-running the build will often cause it to succeed, but we get the failures almost every day now.

Have been running with the GHA cache for a while now with no issues.

#9 exporting to GitHub Actions Cache
#9 preparing build cache for export
#9 preparing build cache for export 0.4s done
#9 DONE 0.4s
ERROR: failed to solve: Internal: not found

Reproduction

Happens with a variety of different image builds, some of which are very thin wrappers on existing images. So any basic Dockerfile should work.

We run the builds from within a github action, automated by python scripts. But the end result is to run commands like this:

docker buildx build --cache-from type=gha,scope=build-cache-daily-${NAME} --cache-to type=gha,scope=build-cache-daily-${NAME},mode=max,ghtoken=${GH_TOKEN},repository=${REPO}  --tag test:latest -f Dockerfile .

Version information

ubuntu-20.04 github actions runner

Docker-Buildx 0.21.1
Docker Client 26.1.3

@tonistiigi
Copy link
Member

Might be related to Github moving to V2 version of their API https://github.blog/changelog/2024-09-16-notice-of-upcoming-deprecations-and-changes-in-github-actions-services/

@bhperry
Copy link
Author

bhperry commented Feb 27, 2025

I did see some other mentions of upcoming cache changes. The fact that it generally does not happen on retry made me think it wasn't that. The brownout dates are certainly interesting, but it happened this morning which is not in the listed dates.

@tonistiigi
Copy link
Member

We already have v2 implementation in BuildKit v0.20.0 . Afaics, most repositories are already using v2 when runners start, although at least initially only some repositories had support. The deadlines are for shutting down v1.

@tonistiigi
Copy link
Member

If you have a link for your Github actions run then you can post it for more details.

@bhperry
Copy link
Author

bhperry commented Feb 27, 2025

It says deprecation of v1-v2 in that article, which was my previous understanding as well. Only v3-v4 supported in future. Ah I understnd now, V2 of the API does not map to the versioning of the action.

This is on a private repository unfortunately, can't link.

@Link-
Copy link

Link- commented Feb 27, 2025

@bhperry - We'd be very grateful if you would reach out to the GitHub support team, open a ticket and drop the link to your run in the private repository. We'd like to take a look to understand this behaviour. Feel free to mention this comment as well and ask the team to escalate to engineering.

@bhperry
Copy link
Author

bhperry commented Feb 27, 2025

@Link- ticket submitted

@crazy-max
Copy link
Member

crazy-max commented Mar 14, 2025

We run the builds from within a github action, automated by python scripts. But the end result is to run commands like this:

docker buildx build --cache-from type=gha,scope=build-cache-daily-${NAME} --cache-to type=gha,scope=build-cache-daily-${NAME},mode=max,ghtoken=${GH_TOKEN},repository=${REPO} --tag test:latest -f Dockerfile .

@bhperry Does this command runs within a run block? If so ACTIONS_* envs are not exposed. But I guess as you see exporting to GitHub Actions Cache in the logs they get exposed.

Also does it change smth if you remove these attributes: ,ghtoken=${GH_TOKEN},repository=${REPO} ?

Can you show your workflow as well please?

And what is the output of docker buildx ls?

@bhperry
Copy link
Author

bhperry commented Mar 17, 2025

@crazy-max Yes it does run within a run block. I have a local action exporting ACTIONS_CACHE_URL and ACTIONS_RUNTIME_TOKEN, similar to your ghaction-github-runtime action.

I'm going to try to make a simpler reproduction of this, because there's a lot of scripts internal to my organization obfuscating what's actually happening in the workflow.

@bhperry
Copy link
Author

bhperry commented Mar 17, 2025

Also, found this in the Post Install buildx logs for a run where this happened. May be useful.

  time="2025-03-17T12:06:38Z" level=debug msg="upload cache chunk https://acghubeus1.actions.githubusercontent.com/q6vtekagspNLqH78DOKwfuMOEZTI5zlE4kT4LEf5zTyqc5iiVg/_apis/artifactcache/caches/43777, range 0-182"
  time="2025-03-17T12:06:38Z" level=debug msg="commit cache https://acghubeus1.actions.githubusercontent.com/q6vtekagspNLqH78DOKwfuMOEZTI5zlE4kT4LEf5zTyqc5iiVg/_apis/artifactcache/caches/43777, size 183"
  time="2025-03-17T12:06:38Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Internal desc = not found" spanID=ce1c5a176200d9a0 traceID=2040d7ff2f039c5ccee3e9bb8903e611
  Internal: not found
  7 3da6d96 buildkitd --debug --allow-insecure-entitlement=network.host
  github.com/moby/buildkit/solver/bboltcachestorage.(*Store).Load.func1
  	/src/solver/bboltcachestorage/storage.go:131
  go.etcd.io/bbolt.(*DB).View
  	/src/vendor/go.etcd.io/bbolt/db.go:917
  github.com/moby/buildkit/solver/bboltcachestorage.(*Store).Load
  	/src/solver/bboltcachestorage/storage.go:124
  github.com/moby/buildkit/solver.(*exporter).ExportTo
  	/src/solver/exporter.go:115
  github.com/moby/buildkit/solver.(*mergedExporter).ExportTo
  	/src/solver/exporter.go:246
  github.com/moby/buildkit/solver/llbsolver.NewProvenanceCreator.func1
  	/src/solver/llbsolver/provenance.go:413
  github.com/moby/buildkit/solver/llbsolver.(*ProvenanceCreator).Predicate
  	/src/solver/llbsolver/provenance.go:463
  github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.2
  	/src/solver/llbsolver/solver.go:236
  github.com/moby/buildkit/solver/llbsolver.(*Solver).recordBuildHistory.func1.3
  	/src/solver/llbsolver/solver.go:274
  golang.org/x/sync/errgroup.(*Group).Go.func1
  	/src/vendor/golang.org/x/sync/errgroup/errgroup.go:78
  runtime.goexit
  	/usr/local/go/src/runtime/asm_amd64.s:1700
  
  7 3da6d96 buildkitd --debug --allow-insecure-entitlement=network.host
  main.unaryInterceptor
  	/src/cmd/buildkitd/main.go:728
  google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1
  	/src/vendor/google.golang.org/grpc/server.go:1203
  github.com/moby/buildkit/api/services/control._Control_Solve_Handler
  	/src/api/services/control/control_grpc.pb.go:289
  google.golang.org/grpc.(*Server).processUnaryRPC
  	/src/vendor/google.golang.org/grpc/server.go:1392
  google.golang.org/grpc.(*Server).handleStream
  	/src/vendor/google.golang.org/grpc/server.go:1802
  google.golang.org/grpc.(*Server).serveStreams.func2.1
  	/src/vendor/google.golang.org/grpc/server.go:1030
  runtime.goexit
  	/usr/local/go/src/runtime/asm_amd64.s:1700
  
  7 3da6d96 buildkitd --debug --allow-insecure-entitlement=network.host
  github.com/moby/buildkit/solver.init
  	/src/solver/cachestorage.go:13
  runtime.doInit1
  	/usr/local/go/src/runtime/proc.go:7291
  runtime.doInit
  	/usr/local/go/src/runtime/proc.go:7258
  runtime.main
  	/usr/local/go/src/runtime/proc.go:254
  runtime.goexit
  	/usr/local/go/src/runtime/asm_amd64.s:1700
  
  time="2025-03-17T12:06:38Z" level=debug msg="session finished: <nil>" spanID=4af174f077a69337 traceID=2040d7ff2f039c5ccee3e9bb8903e611

@bhperry
Copy link
Author

bhperry commented Mar 17, 2025

@crazy-max Here is the output of docker buildx ls within a run (which did happen to hit the failure)

NAME/NODE                                           DRIVER/ENDPOINT                   STATUS    BUILDKIT   PLATFORMS
builder-aa3e09ee-fcf5-4a2b-afbb-a8[18](https://github.com/saturncloud/release-images/actions/runs/13910231166/job/38922636188#step:11:19)612f33cb*       docker-container                                       
 \_ builder-aa3e09ee-fcf5-4a2b-afbb-a818612f33cb0    \_ unix:///var/run/docker.sock   running   248ff7c    linux/amd64 (+3), linux/386
default                                             docker                                                 
 \_ default                                          \_ default                       running   v0.13.2    linux/amd64 (+3), linux/386

I noticed that ghaction-github-runtime ends up exporting some extra envs including ACTIONS_CACHE_SERVICE_V2=True, so I thought perhaps that was causing the issue. But after switching from my local action the next run actually hit Internal: not found again. Probably should be using that action anyway though to future proof.

Unfortunately have not yet been able to repro on a different workflow. With it being a non-deterministic failure it's difficult to know for sure if it won't happen or just hasn't yet.

@bhperry
Copy link
Author

bhperry commented Mar 20, 2025

Any ideas on what to look at to figure out what could be causing this? Looking at the line that is throwing the error, it seems to be more related to internal cache storage than the GHA cache. Not familiar enough with the buildkit codebase to understand why the bucket could be missing.

func (s *Store) Load(id string, resultID string) (solver.CacheResult, error) {
var res solver.CacheResult
if err := s.db.View(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte(resultBucket))
if b == nil {
return errors.WithStack(solver.ErrNotFound)
}
b = b.Bucket([]byte(id))
if b == nil {
return errors.WithStack(solver.ErrNotFound)

@tonistiigi
Copy link
Member

@bhperry This link is for the default local cache using the bolt database. Remote backends like GHA do not go through it.

@bhperry
Copy link
Author

bhperry commented Mar 20, 2025

@tonistiigi That is what is pointed to in the stack trace I posted above #5784 (comment) (which I see in the buildx container logs every time this error happens).

I thought maybe the cache data is stored locally in bbolt during build and then uploaded to gha. Is it a red herring?

@tonistiigi
Copy link
Member

Based on the stacktrace, I think we should consider this unrelated to GHA changes atm. Still needs a reproduction for further debug.

@aaronlehmann
Copy link
Collaborator

We hit this issue and had to roll back to buildkitd v0.19. Some images consistently fail to push with this message.

@tonistiigi
Copy link
Member

@aaronlehmann Can you bisect? Are you using GHA?

@aaronlehmann
Copy link
Collaborator

Bisecting wouldn't be easy, it takes several steps and ~1 hour to deploy this service. We are not using GHA.

@bhperry bhperry changed the title GHA build cache ERROR: failed to solve: Internal: not found Build cache ERROR: failed to solve: Internal: not found Apr 1, 2025
@bhperry
Copy link
Author

bhperry commented Apr 1, 2025

We hit this issue and had to roll back to buildkitd v0.19. Some images consistently fail to push with this message.

Failing to push sounds like it may be different issue than what I'm seeing. My builds push successfully but fail on the cache step. Seems to be a pretty generic error that gets used in many places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants