-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing layer blob: Broken pipe #657
Comments
@cgwalters , the problem came up again in MicroShift CI (see this log as an example). |
@cgwalters , our CI is in great pain because of this issue, is there a known workaround ? |
Not currently other than retrying, which we should probably add some defaults to do that... |
@cgwalters , this problem is causing a lot of grief in MicroShift CI. It's all over either booting VMs, or even running tests that pull a new layer. Is there a chance we add the retry w/a sooner than later in CentOS 9 and also backport it to RHEL 9.4? Once implementer, we should immediatelly provide feedback whether it made a difference. |
I can "easily" reproduce this problem on a local VM by running the following command to alternate between the images (new -> original -> new -> etc.) without rebooting. Is there any additional debug information I can provide to give a clue on why this is happening?
On the registry server side, I'm seeing the following messages in the log when the problem occurs. Looks like the issue is on the client.
|
I've been trying to reproduce this today without much success. I managed to do it one time by switching back and forth, and then I started poking at things with |
As far as tweaking timing one thing to try is using e.g. strace fault injection to add random delays (especially to read+write), something like:
(Fine tune the injection step and delay as desired) |
I let it run for a bit more Wednesday, no luck. Yesterday I took the opportunity to rebuild my dev setup with a fresh podman-machine, and latest podman-bootc from git. Then I let it run all day trying to reproduce. More precisely, what I did was:
Couldn't get it to reproduce, ran it for about 6 hours like that. Then I ran it again, except removing all of the At this point it's driving me absolutely mad that I was able to reproduce it the one time within like 5 minutes of trying, and then never again 😠 |
@jeckersb, I used the following commands to reproduce the error and I'm attaching the
The error was the "usual" one.
|
This one is like my enemy! I have a tracker over here for it coreos/rpm-ostree#4567 too
Discoveries so far:
More generally it's definitely a race condition; I can sometimes reproduce this by doing
ostree refs --delete ostree/container
and then re-running the rebase.Also of note: kola defaults to a uniprocessor VM, which I think is more likely to expose this race.
I'm quite certain it has something to do with the scheduling of us closing the pipe vs calling
FinishPipe
.The text was updated successfully, but these errors were encountered: