Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend -net-disconnect-ok capability to unix domain sockets #10897

Open
cweld510 opened this issue Sep 11, 2024 · 1 comment
Open

Extend -net-disconnect-ok capability to unix domain sockets #10897

cweld510 opened this issue Sep 11, 2024 · 1 comment
Labels
type: enhancement New feature or request

Comments

@cweld510
Copy link

cweld510 commented Sep 11, 2024

Description

We're running into problems with checkpointing containers that have connections on a host unix domain socket mounted to the container; it results in the following error (expected since SCMConnectedEndpoint objects aren't saveable):

encoding error: runtime error: invalid memory address or nil pointer dereference:
goroutine 109 [running]:
gvisor.dev/gvisor/pkg/state.safely.func1()
	pkg/state/state.go:309 +0x179
panic({0x1179260?, 0x34acff0?})
	GOROOT/src/runtime/panic.go:770 +0x132
gvisor.dev/gvisor/pkg/sentry/socket/unix/transport.(*SCMConnectedEndpoint).StateTypeName(0x34b11b0?)
	<autogenerated>:1 +0x9
gvisor.dev/gvisor/pkg/state.lookupNameFields({0x15ca848, 0x11d3880})
	pkg/state/types.go:119 +0xbc
gvisor.dev/gvisor/pkg/state.(*typeEncodeDatabase).Lookup(0xc0007d51a8, {0x15ca848, 0x11d3880})
	pkg/state/types.go:135 +0x4c
gvisor.dev/gvisor/pkg/state.(*encodeState).findType(0xc0007d5188, {0x15ca848, 0x11d3880})
	pkg/state/encode.go:559 +0x45
gvisor.dev/gvisor/pkg/state.(*encodeState).findType(0xc0007d5188, {0x15ca848, 0x130a960})

We rely on the -net-disconnect-ok flag when checkpointing containers in production to close any TCP connections open at the time of the checkpoint rather than having the checkpoint attempt fail. This is fairly critical for us because we're running arbitrary user code and it's hard to guarantee that there are no open connections at the time we checkpoint.

If possible, we'd like for this flag (or a new flag) to apply to open unix domain sockets that are backed by host FDs. We mount (on the container) some host domain sockets for IPC between in-sandbox processes and our agent code running on the host, and in practice, we can't guarantee that the sockets are closed in gvisor at the time we try to checkpoint the container. This prevents us from successfully checkpointing certain workloads for some customers. The only way around this that I can think of is to have gvisor close the socket itself. It seems like there is precedent for this because gvisor already can do this for TCP connections.

I'm happy to attempt this myself if needed.

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

@cweld510 cweld510 added the type: enhancement New feature or request label Sep 11, 2024
@kevinGC
Copy link
Collaborator

kevinGC commented Sep 11, 2024

I think this would be reasonable to bundle with --net-disconnect-ok if you want to take a shot at it. The need / use case for it seems more or less the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants