Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation/clarification on SeccompCmpArgLen #59

Open
boustrophedon opened this issue Sep 10, 2023 · 7 comments
Open

Documentation/clarification on SeccompCmpArgLen #59

boustrophedon opened this issue Sep 10, 2023 · 7 comments

Comments

@boustrophedon
Copy link
Contributor

Is it basically always okay to use SeccompCmpArgLen::Qword on 64 bit systems? I'm doing so in my tests and I don't seem to see any issues, but I also don't have anything close to exhaustive tests of all syscalls.

Libseccomp seems to always do the full 64 bit comparison on 64 bit systems: https://github.com/seccomp/libseccomp/blob/main/src/db.c#L1665

I don't have any plans currently to support 32 bit systems but I'm just not sure when I might encounter issues. Does it exist basically so that the backend can generate either 32 or 64 bit ebpf regardless of what the host architecture is? i.e. When actually running the ebpf, you should always use Qword on 64 bit systems and Dword on 32 bit systems?

@alindima
Copy link
Collaborator

Hi. Have a look here to see why this arg len was introduced: firecracker-microvm/firecracker#1227
The issue linked in that PR description has a lot of details.

Let me know if this solves the issue. I think you're right, we should better document this enum

@boustrophedon
Copy link
Contributor Author

Thanks for the link! I've now read the thread and am able to reproduce the issue in both libseccomp and libseccomp-rs. I think to summarize (mostly the same as serban300's summary):

glibc's ioctl definition takes a c_ulong for its second argument, whereas musl takes a c_int. man 2 ioctl says they are 32 bit constants, and so does posix, but glibc is a pain as always.

# in rust's libc crate
[linux_like (main)]$ rg 'pub fn ioctl' | rg 'gnu|musl'
linux/musl/mod.rs:    pub fn ioctl(fd: ::c_int, request: ::c_int, ...) -> ::c_int;
linux/gnu/mod.rs:    pub fn ioctl(fd: ::c_int, request: ::c_ulong, ...) -> ::c_int;

While you can cast x as libc::c_int or x as libc::c_ulong depending on what libc you're building for, the issue is that since musl treats it as signed, it does a sign-extending mov into the kernel's 64 bit argument register:

# excerpt from `objdump -d /usr/lib/musl/lib/libc.a`
Disassembly of section .text.ioctl:

0000000000000000 <ioctl>:
   0:   48 83 ec 58             sub    $0x58,%rsp
   4:   48 63 ff                movslq %edi,%rdi
   7:   48 63 f6                movslq %esi,%rsi                     # rsi is the second parameter of a syscall
   a:   48 89 54 24 30          mov    %rdx,0x30(%rsp)
   f:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
  16:   00 00
  18:   48 89 44 24 18          mov    %rax,0x18(%rsp)
  1d:   31 c0                   xor    %eax,%eax
  1f:   48 8d 44 24 60          lea    0x60(%rsp),%rax
  24:   c7 04 24 10 00 00 00    movl   $0x10,(%rsp)
  2b:   48 89 44 24 08          mov    %rax,0x8(%rsp)
  30:   48 8d 44 24 20          lea    0x20(%rsp),%rax
  35:   48 89 44 24 10          mov    %rax,0x10(%rsp)
  3a:   b8 10 00 00 00          mov    $0x10,%eax
  3f:   0f 05                   syscall

So when a 32 bit int greater than 0x8000_0000 gets passed as a signed 32 bit parameter, the sign bit gets extended to the upper 32 bits.

On the seccomp/BPF side, struct seccomp_data (see <linux/seccomp.h>) has the arguments uniformly in an array of unsigned 64 bit ints, so there's no way to tell whether it was originally "supposed to be" a 32 bit integer or not. You have to know when creating the filter itself, so you can know whether the argument is "supposed to be 32 bit" and you should only compare the lsb, or whether you should compare the entire parameter.


libseccomp and libseccomp-rs repros
https://gist.github.com/boustrophedon/aa46c0c6c28b38a542a73d2205471c07

@boustrophedon
Copy link
Contributor Author

boustrophedon commented Sep 12, 2023

So I'm wondering if the best thing to do would be to create a "quirks file" either in seccompiler or in extrasafe that simply keeps track of which syscalls have 32 bit arguments. The only risk I can think of is that a syscall expanding the type of a parameter, which seems unlikely, but even that can probably be mitigated by requiring that the msb all be 0 or 1 in the filter so that nothing gets through.

The alternative is that this work is done in each project that ues seccomp individually.

Do you know if anyone spoke to the libseccomp people or the seccomp maintainers about this issue?

@boustrophedon
Copy link
Contributor Author

Ah, I think they are aware because I found this section in the manpage:


       It is recommended that whenever possible developers avoid using
       the SCMP_CMP() and SCMP_A{0-5}() macros and use the variants
       which are explicitly 32 or 64-bit.  This should help eliminate
       problems caused by an unwanted sign extension of negative datum
       values.

@alindima
Copy link
Collaborator

alindima commented Sep 14, 2023

I don't think maintaining in seccompiler a list of syscalls that use 32-bit and 64-bit params is feasible.
Every project that wants to use secomp is supposed to audit their filter and decide which argument length to use depending on whether they use musl/glibc and their respective versions.

This is quite a confusing issue, so maybe having it better documented somewhere would help. Both in the readme and in the rust docs of SeccompCmpArgLen
Since you got a pretty good understanding of it, would you like to contribute it? 😄

@boustrophedon
Copy link
Contributor Author

Every project that wants to use secomp is supposed to audit their filter and decide which argument length to use depending on whether they use musl/glibc and their respective versions.

I think this is why seccomp hasn't seen wider adoption and why people praise openbsd's pledge and unveil over it - it's just simpler to use. 99.99% of developers have no idea whether they're using an ioctl parameter with the 32nd bit set or not when they develop a web application with a directory traversal vulnerability.

I'll experiment with a quirks list inside extrasafe.

Regarding the documentation, I can maybe write a PR but I have a bunch of other things I also need to do so it might take some time.

@alindima
Copy link
Collaborator

it's just simpler to use

Maybe, for basic use-cases, but I believe seccomp is arguably more flexbile for an advanced user.
AFAICT, pledge is a higher-level primitive than seccomp, one that probably can have an equivalent wrapper over seccomp, on linux, at least to some extent.

Now, system call filtering is a very difficult thing to get right for an application, simply because most of them are not calling system calls directly and don't have control over which system calls their dependencies are using.
My opinion is that if you want a simple but secure sandbox, use light-weight virtualization. IMO seccomp is better as a last-resort mechanism (to simply limit the attack surface of the kernel in the unlikely event that a sandbox escape does happen).

Anyway, the vast majority of use-cases shouldn't be concerned IMO with restricting parameters of system calls. For the majority of them, I think simple filtering on the syscall number is probably enough.

Regarding the documentation, I can maybe write a PR but I have a bunch of other things I also need to do so it might take some time.

same here. I'll leave this issue open to track the documentation change, in case anybody gets some time to write it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants