steamrolled 3 days ago

The main reason system() exists is that people want to execute shell commands; some confused novice developers might mix it up with execl(), but this is not a major source of vulnerabilities. The major source of vulnerabilities is "oh yeah, I actually meant to execute shell".

So if you just take away the libcall, people will make their own version by just doing execl() of /bin/sh. If you want this to change, I think you have to ask why do people want to do this in the first place.

And the answer here is basically that because of the unix design philosophy, the shell is immensely useful. There are all these cool, small utilities and tricks you can use in lieu of writing a lot of extra code. On Windows, command-line conventions, filesystem quirks, and escaping gotchas are actually more numerous. It's just that there's almost nothing to call, so you get fewer bugs.

The most practical way to make this class of bugs go away is to make the unix shell less useful.

1
rcxdude 1 day ago

most calls to system() that I've seen could be replaced with exec without much difficulty. There's relatively few that actually need the shell functionality.

oguz-ismail 1 day ago

system() involves fork()ing, setting up signal handlers, exec()ing and wait()ing. You won't be replacing it with exec, most of the time you'll be reimplementing it for absolutely no reason.

kragen 1 day ago

Python has os.spawnl, os.spawnv, etc., which fork()s, wait4()s, etc., without involving a shell. This is much better; this is the library function you should be using instead of system() most of the time. Unfortunately I don't think glibc has an equivalent!

    strace -o tmp.spawnlp -ff python3 -c 'import os; os.spawnlp(os.P_WAIT, "true", "true")' 
In parent:

    clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdc03233310) = 225954
    wait4(225954, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 225954
    --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=225954, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
In child:

    set_robust_list(0x7fdc03233320, 24)     = 0
    gettid()                                = 225954
    clock_gettime(CLOCK_MONOTONIC, {tv_sec=2458614, tv_nsec=322829153}) = 0
    clock_gettime(CLOCK_MONOTONIC, {tv_sec=2458614, tv_nsec=323030718}) = 0
    execve("/usr/local/bin/true", ["true"], 0x7ffdc5008458 /* 44 vars */) = -1 ENOENT (No such file or directory)
    execve("/usr/bin/true", ["true"], 0x7ffdc5008458 /* 44 vars */) = 0
Here, I think strace shows clone() rather than fork() because glibc's fork() is a library function that invokes clone(), rather than a real system call.

oguz-ismail 1 day ago

> Python has os.spawnl, os.spawnv, etc., which fork()s, wait4()s, etc., without involving a shell.

Good. How do you pipeline commands with these?

kragen 1 day ago

These functions can't do it. In Python you have to use the subprocess module if you want to pipeline commands without the bugs introduced by the shell. From https://docs.python.org/3.7/library/subprocess.html#replacin...:

    p1 = Popen(["dmesg"], stdout=PIPE)
    p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
    p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
    output = p2.communicate()[0]
Of course, now, nobody has an hda, and dmesg is root-only. A more modern example is in http://canonical.org/~kragen/sw/dev3/whereroot.py:

    p1 = subprocess.Popen(["df"], stdout=subprocess.PIPE)
    p2 = subprocess.Popen(["grep", "/$"], stdin=p1.stdout, stdout=subprocess.PIPE)
    p1.stdout.close()
    return p2.communicate()[0]
Note that the result here is a byte string, so if you want to print it out safely without the shell-like bugginess induced by Python's default character handling (what happens if the device name isn't valid UTF-8?), you have to do backflips with sys.stdout.buffer or UTF-8B.

Python got a lot of things wrong, and it gets worse all the time, but for now spawning subprocesses is one of the things it got right. Although, unlike IIRC Tcl, it doesn't raise an exception by default if one of the commands fails.

Apart from the semantics of the operations, you could of course desire a better notation for them. In Python you could maybe achieve something like

    (cmd(["df"]) | ["grep", "/$"]).output()
but that is secondary to being able to safely handle arguments containing spaces and pipes and whatnot.

oguz-ismail 1 day ago

Dunno, so much work to achieve so little. I'm even more inclined to stick with shell scripts now

kragen 1 day ago

The Bourne shell is definitely less work unless you want your code to correctly or reliably handle user input. Then it's more work.

oguz-ismail 1 day ago

Not in my experience. Any concrete examples off the top of your head, where it's more work than setting up pipes in python manually?

baobun 12 hours ago

Fill in the blank to run a docker container which opens the file with user-provided path in (say) vim.

docker run --rm -it ...?

Now run a container doing the exact same thing ("docker-in-docker").

docker run --rm -it -v $DOCKER_HOST:/var/run/docker.sock ...?

oguz-ismail 11 hours ago

> Fill in the blank to run a docker container which opens the file with user-provided path in (say) vim.

Never used docker before, but this seems to work:

    docker run --rm -it debian bash -c 'vim -- "$1"' _ "$user_provided_path"

kragen 52 minutes ago

Looks relatively safe to me, though it doesn't seem to work because debian:latest doesn't have vim in it (so I'm skeptical of your implicit claim of having tried it), and, if $user_provided_path is empty, it defaults to browsing the filesystem. But there are a lot of characters there that are specifically there to avoid footguns; without them, it would seem to work, but it would fail when $user_provided_path contained special characters.

The version I tested was

    docker run --rm -it debian bash -c 'apt update; apt install -y vim; vim -- "$1"' _ "$user_provided_path"

oguz-ismail 44 minutes ago

> your implicit claim of having tried it

I tried printing positional parameters, they looked fine. (And already uninstalled docker. What's the point of containerization if you need superuser privileges to use it?)

> if $user_provided_path is empty, it defaults to browsing the filesystem

That's what

    vim -- ""
does.

> But there are a lot of characters there that are specifically there to avoid footguns

What are those characters? --? That's not a lot

hollerith 23 hours ago

The Python code Kragen gave is more characters to type, but fewer footguns.

Shell scripts are much higher in footguns per character than most programming languages.

It is possible for a coder to understand bash so well that he never shoots his own foot off, but it requires more learning hours than the same feat in another language requires, and unless I've also put in the (many) learning hours, I have no way of knowing whether a shell script written by someone I don't know contains security vulnerabilities or fragility when dealing with unusual inputs that will surface in unpredictable circumstances.

The traditional Unix shell might be the most overrated tool on HN.

oguz-ismail 13 hours ago

> The Python code Kragen gave is more characters to type, but fewer footguns.

What are the footguns in this?

    df | grep /$

hollerith 4 hours ago

There are none AFAIK.

Allow me to correct my previous comment: using Python's os.spawnlp or Popen to invoke Unix commands has fewer footguns than using the shell. (Kragen's code only addresses your question of how to set up a pipeline in Python.)

panzi 17 hours ago

There is posix_spawn(). Some operating systems even implement that as a system call (not Linux). Implementing that as a system call has the advantage that spawning a new process from a process that has huge memory mapping is fast, because the memory mappings don't need to be copied (yes, I know the memory is copy on write, but the mappings themselves have to be correctly copied with the information needed for copy on write).