Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running filebench in gVisor #10901

Open
Anjali05 opened this issue Sep 12, 2024 · 13 comments
Open

Running filebench in gVisor #10901

Anjali05 opened this issue Sep 12, 2024 · 13 comments
Labels
type: enhancement New feature or request

Comments

@Anjali05
Copy link

Anjali05 commented Sep 12, 2024

Description

I am trying to run some filesystem metadata stress tests using Filebench(https://github.com/filebench/filebench) on gVisor. For the experiments, I need to turn off address randomization using echo 0 > /proc/sys/kernel/randomize_va_space. I understand that I cannot access or modify this from gVisor. This has been an issue in Filebench for some time (filebench/filebench#163 and filebench/filebench#112) and I am not sure if they are going to fix it. I was wondering if is there any workaround within gVisor that I can do for this to work?

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

@Anjali05 Anjali05 added the type: enhancement New feature or request label Sep 12, 2024
@milantracy
Copy link
Contributor

I don't see randomize_va_space at

func (fs *filesystem) newSysDir(ctx context.Context, root *auth.Credentials, k *kernel.Kernel) kernfs.Inode {
return fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"kernel": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"cap_last_cap": fs.newInode(ctx, root, 0444, newStaticFile(fmt.Sprintf("%d\n", linux.CAP_LAST_CAP))),
"hostname": fs.newInode(ctx, root, 0444, &hostnameData{}),
"overflowgid": fs.newInode(ctx, root, 0444, newStaticFile(fmt.Sprintf("%d\n", auth.OverflowGID))),
"overflowuid": fs.newInode(ctx, root, 0444, newStaticFile(fmt.Sprintf("%d\n", auth.OverflowUID))),
"random": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"boot_id": fs.newInode(ctx, root, 0444, newStaticFile(randUUID())),
}),
"sem": fs.newInode(ctx, root, 0444, newStaticFile(fmt.Sprintf("%d\t%d\t%d\t%d\n", linux.SEMMSL, linux.SEMMNS, linux.SEMOPM, linux.SEMMNI))),
"shmall": fs.newInode(ctx, root, 0444, ipcData(linux.SHMALL)),
"shmmax": fs.newInode(ctx, root, 0444, ipcData(linux.SHMMAX)),
"shmmni": fs.newInode(ctx, root, 0444, ipcData(linux.SHMMNI)),
"msgmni": fs.newInode(ctx, root, 0444, ipcData(linux.MSGMNI)),
"msgmax": fs.newInode(ctx, root, 0444, ipcData(linux.MSGMAX)),
"msgmnb": fs.newInode(ctx, root, 0444, ipcData(linux.MSGMNB)),
"yama": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"ptrace_scope": fs.newYAMAPtraceScopeFile(ctx, k, root),
}),
}),
"fs": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"nr_open": fs.newInode(ctx, root, 0644, &atomicInt32File{val: &k.MaxFDLimit, min: 8, max: kernel.MaxFdLimit}),
}),
"vm": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"max_map_count": fs.newInode(ctx, root, 0444, newStaticFile("2147483647\n")),
"mmap_min_addr": fs.newInode(ctx, root, 0444, &mmapMinAddrData{k: k}),
"overcommit_memory": fs.newInode(ctx, root, 0444, newStaticFile("0\n")),
}),
"net": fs.newSysNetDir(ctx, root, k),
})
}

would you implement that if you are interested

for the workaround, you can simply implement a fake entry to skip the step

@Anjali05
Copy link
Author

Anjali05 commented Sep 12, 2024

I don't see randomize_va_space at

func (fs *filesystem) newSysDir(ctx context.Context, root *auth.Credentials, k *kernel.Kernel) kernfs.Inode {
return fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"kernel": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"cap_last_cap": fs.newInode(ctx, root, 0444, newStaticFile(fmt.Sprintf("%d\n", linux.CAP_LAST_CAP))),
"hostname": fs.newInode(ctx, root, 0444, &hostnameData{}),
"overflowgid": fs.newInode(ctx, root, 0444, newStaticFile(fmt.Sprintf("%d\n", auth.OverflowGID))),
"overflowuid": fs.newInode(ctx, root, 0444, newStaticFile(fmt.Sprintf("%d\n", auth.OverflowUID))),
"random": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"boot_id": fs.newInode(ctx, root, 0444, newStaticFile(randUUID())),
}),
"sem": fs.newInode(ctx, root, 0444, newStaticFile(fmt.Sprintf("%d\t%d\t%d\t%d\n", linux.SEMMSL, linux.SEMMNS, linux.SEMOPM, linux.SEMMNI))),
"shmall": fs.newInode(ctx, root, 0444, ipcData(linux.SHMALL)),
"shmmax": fs.newInode(ctx, root, 0444, ipcData(linux.SHMMAX)),
"shmmni": fs.newInode(ctx, root, 0444, ipcData(linux.SHMMNI)),
"msgmni": fs.newInode(ctx, root, 0444, ipcData(linux.MSGMNI)),
"msgmax": fs.newInode(ctx, root, 0444, ipcData(linux.MSGMAX)),
"msgmnb": fs.newInode(ctx, root, 0444, ipcData(linux.MSGMNB)),
"yama": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"ptrace_scope": fs.newYAMAPtraceScopeFile(ctx, k, root),
}),
}),
"fs": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"nr_open": fs.newInode(ctx, root, 0644, &atomicInt32File{val: &k.MaxFDLimit, min: 8, max: kernel.MaxFdLimit}),
}),
"vm": fs.newStaticDir(ctx, root, map[string]kernfs.Inode{
"max_map_count": fs.newInode(ctx, root, 0444, newStaticFile("2147483647\n")),
"mmap_min_addr": fs.newInode(ctx, root, 0444, &mmapMinAddrData{k: k}),
"overcommit_memory": fs.newInode(ctx, root, 0444, newStaticFile("0\n")),
}),
"net": fs.newSysNetDir(ctx, root, k),
})
}

would you implement that if you are interested

for the workaround, you can simply implement a fake entry to skip the step

Yes, it's not currently there. I think adding it as a static file should work?

@milantracy
Copy link
Contributor

since it is writing to the file, implementing the interface at

type WritableDynamicBytesSource interface {
would be better.

@EtiennePerot
Copy link
Contributor

I'm not sure I understand the request.

Does Filebench absolutely require running on a host that has ASLR turned off, and you are looking to see how that could be enabled from within gVisor?
Or is Filebench just trying to write to /proc/sys/kernel/randomize_va_space as part of its initialization process, and that code is failing in gVisor because the file doesn't exist, and you're looking to stub out this file just to let Filebench start up?

@Anjali05
Copy link
Author

I'm not sure I understand the request.

Does Filebench absolutely require running on a host that has ASLR turned off, and you are looking to see how that could be enabled from within gVisor? Or is Filebench just trying to write to /proc/sys/kernel/randomize_va_space as part of its initialization process, and that code is failing in gVisor because the file doesn't exist, and you're looking to stub out this file just to let Filebench startup?

So far I have not been able to successfully run Filebench unless /proc/sys/kernel/randomize_va_space contains 0. If the file is not present or contains something other than 0. Filebench starts but does not finish execution and abruptly terminates. I am not sure internally what it's doing.

@EtiennePerot
Copy link
Contributor

Can you simply mount a file containing the string "0" there?

$ echo 0 > /tmp/just_zero.txt
$ docker run --rm --runtime=runsc \
    -v /tmp/just_zero.txt:/proc/sys/kernel/randomize_va_space \
    ubuntu cat /proc/sys/kernel/randomize_va_space
0

@Anjali05
Copy link
Author

@EtiennePerot Hmm, that might work. Let me try it

@Anjali05
Copy link
Author

@EtiennePerot It did not work, then there might be some other issue, not sure

@nixprime
Copy link
Member

filebench shares a filebench_shm_t object, the pointee of ipc.h:filebench_shm, between multiple worker processes using a file that is mapped into all processes using MAP_SHARED. *filebench_shm contains pointers into itself; for this to work between processes, its address (filebench_shm) must be consistent between them. Thus, as currently written, filebench needs to disable ASLR to work; tricking it into thinking ASLR is disabled is insufficient.

@Anjali05
Copy link
Author

@nixprime Is it possible to disable ASLR in gvisor currently?

@ayushr2
Copy link
Collaborator

ayushr2 commented Sep 12, 2024

IIUC ASLR in gVisor is implemented here:

maxRand := hostarch.Addr(maxMmapRand64)
if topDownMin < preferredTopDownBaseMin {
// Try to keep TopDownBase above preferredTopDownBaseMin by
// shrinking maxRand.
maxAdjust := maxRand - minMmapRand64
needAdjust := preferredTopDownBaseMin - topDownMin
if needAdjust <= maxAdjust {
maxRand -= needAdjust
}
}
rnd := mmapRand(uint64(maxRand))
l := MmapLayout{
MinAddr: min,
MaxAddr: max,
// TASK_UNMAPPED_BASE in Linux.
BottomUpBase: (max/3 + rnd).RoundDown(),
TopDownBase: (max - gap - rnd).RoundDown(),
DefaultDirection: defaultDir,
// We may have reduced the maximum randomization to keep
// TopDownBase above preferredTopDownBaseMin while maintaining
// our stack gap. Stack allocations must use that max
// randomization to avoiding eating into the gap.
MaxStackRand: uint64(maxRand),
}

We don't have a way to disable that as of right now.

@EtiennePerot
Copy link
Contributor

EtiennePerot commented Sep 12, 2024

FYI, gVisor already has filesystem stress tests based on the fio tool. For example, to test runsc I/O performance with KVM+DirectFS, you could do:

# Build runsc:
$ mkdir bin
$ make copy TARGETS=runsc DESTINATION=bin/

# Install Docker runtime with KVM + DirectFS enabled:
$ sudo bin/runsc install --runtime=runsc-bench -- --platform=kvm --directfs=true
$ sudo systemctl restart docker

# Run `fio` benchmark:
$ make RUNTIME=runsc-bench BENCHMARKS_TARGETS=test/benchmarks/fs:fio_test run-benchmark
BenchmarkFioWrite/operation.write/ioEngine.sync/jobs.1/blockSize.4K/directIO.false/filesystem.bindfs-20                    19762           1800039 ns/op         604614656 bandwidth.bytes_per_second           147611 io_ops.ops_per_second
[...]

To compare with unsandboxed, use RUNTIME=runc instead.

@Anjali05
Copy link
Author

Anjali05 commented Sep 13, 2024

@EtiennePerot Thank you. Yeah, I have no problem running fio. I wanted to try Filebench as it has better and more flexible filesystem metadata stress tests such as creating and deleting large numbers of files etc. As far as I know, Fio is recommended mostly for testing raw I/O performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants