Item 44238106

Galanwe • 2 days ago

I don't understand what you are complaining about. I don't understand what the article is complaining about either.

exec* are not "better replacements" of the shell, they are just used for different use cases.

The whole article could be summarized to 3 bullet points:

1) Sanitize your inputs

2) If you want to execute a specific program, exec it after 1), no need for the shell

3) Allow the shell if there is no injection risk

jcranmer • 2 days ago

The article spends a lot of time dancing around its central points rather than addressing them directly, but the basic problems with shell boil down to this:

There's two ways to think of "running a command:"

1. A list of strings containing an executable name (which may or may not be a complete path) and its arguments (think C's const char **argv).

2. A single string which is a space-separated list of arguments, with special characters in arguments (including spaces) requiring quoting to represent correctly.

Conversion between these two forms is non trivial. And the basic problem is that there's a lot of tools which incorrectly convert the former to the latter by just concatenating all of the arguments into a single string and inserting spaces. Part of the problem is that shell script itself makes doing the conversion difficult, but the end effect is that if you have to with commands with inputs that have special characters (including, but not limited to, spaces), you end up just going slowly insane trying to figure out how to get the quoting right to work around the broken tools.

In my experience, the world is so much easier if your own tools just break everything up into the list-of-strings model and you never to try to use an API that requires single-string model.

What GP is referring to is the fact that that solution doesn't work as well on Windows, because the OS's native idea of a command line isn't list-of-strings but rather a single-string, and how that single string is broken up into a list-of-strings is dependent on the application being invoked.

1 reply

theamk • 2 days ago

I think "non trivial" and "slowly going insane" parts only happen if you don't have right tools, or not using POSIX-compatable system.

In python you have "shlex.quote" and "shlex.join". In bash, you have "${env@Q}". I've found those to work wonderfully to me - and I did crazy things like quote arguments, embed into shell script, quote script again for ssh, and quote 3rd time to produce executable .sh file.

In other languages.. yeah, you are going to have bad time. Especially on Windows, where I'd just give up and move to WSL.

1 reply

jcranmer • 2 days ago

To be honest, I've never heard of Bash's @Q solution before today--I can't find it in https://tldp.org/LDP/abs/html/, which is my usual goto guide for "how do I do $ADVANCED_FEATURE in bash?"

2 replies

o11c • 2 days ago

To be fair that's missing a lot. I'm not sure how much is just showing its age and how much it never had. The actual bash manual is quite informative.

In particular, failure to mention `printf -v` is horrible. Not only is it better performing than creating a whole process for command substitution, it also avoids the nasty newline problem.

1 reply

LukeShu • 11 hours ago

`printf -v` was added in Bash 3.1 (2005). I think revisions of ABS predates that; but ABS has certainly been updated since then (last in 2014), and has no excuse for not including it.

LukeShu • 11 hours ago

@Q was added in Bash 4.4 (2016), ABS was last updated in 2014.

panzi • 2 days ago

I'd say: Don't use the shell if what you want to do is to execute another program.

You don't need to handle any quoting with exec*(). You still need to handle options, yes. But under Windows you always have to to handle the quoting yourself and it is more difficult than for the POSIX shell and it is program dependent. Without knowing what program is executed you can't know what quoting syntax you have to use and as such a standard library cannot write a generic interface to pass arguments to another process in a safe way under Windows.

I just felt it sounded like POSIX is particularly bad in that context, while in fact it is better than Windows here. Still, the system() function is a mistake. Use posix_spawn(). (Note: Do not use _spawn*() under Windows. That just concatenates the arguments with a space between and no quoting whatsoever.)

1 reply

oguz-ismail • 2 days ago

>Still, the system() function is a mistake. Use posix_spawn().

They are entirely different interfaces though. If you'd implemented system() using posix_spawn() it'd be just as bad as system()

1 reply

panzi • 2 days ago

Why would you implement system() at all?

2 replies

theamk • 2 days ago

parse commands from config file? command-line arguments for hooks?

https://news.ycombinator.com/item?id=44239036

1 reply

panzi • 2 days ago

I understand that it is convenient for running small snippets like that, but I don't really think it's worth the risk. And putting it into a config file is different, IMO. You don't get tempted to do some bad string interpolation there, because you can't, unless the config file format has support for that, but then I criticize that. If you need to pass things to such a snipped do it via environment variables or standard IO, not string interpolation.

If you say you don't make such mistakes: Yeah, but people do. People that write the code that runs on your system.

1 reply

theamk • 1 day ago

But if you want a command-line option for hook, what are the alternatives?

Force user to always create a wrapper script? that's just extra annoyance and if user is bad at quoting, they'll have the same problems with a script

Disable hooks at all? that's bad functionality regression

Ask for multiple arguments? this makes command-line parsing much more awkward.. I have not seen any good solutions for that.

(The only exception is writing a command wrapper that takes exactly 1 user command, like "timeout" or "xargs".. but those already using argument vector instead of parsing)

1 reply

frumplestlatz • 19 hours ago

You define a config file format that supports only the minimal syntax required to specify a multi-argument command (e.g. spaces separate arguments, arguments with spaces in them may be quoted or use backslashes to escape them).

Then, you parse that out into a proper argument array and pass it to exec*/posix_spawn.

1 reply

account42 • 9 hours ago

So instead of a well known (i.e. POSIX) quoting semantics and existing tool support, you want to introduce your own ad-hoc format? No thanks.

1 reply

frumplestlatz • 4 hours ago

A correct parser for the syntax I described can be written in less than a 100 lines of code — even in C. It’s a strict subset of the shell command language defined by POSIX, and it’s sufficiently expressive as to support specifying any argument array unambiguously.

To correctly escape arbitrary shell syntax, not only do you need to handle the full POSIX syntax (which is quite complex) …

https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V...

… but you must also cover any bugs and undocumented/underspecified extensions implemented by the actual shell providing /bin/sh on every platform and platform version to which your code will be deployed.

That’s not just difficult — it’s impossible, and everyone that has tried has failed, repeatedly. Leading to security bugs, repeatedly.

https://gist.github.com/Zenexer/40d02da5e07f151adeaeeaa11af9...

There’s a reason why we use parameterized queries instead of escaping to prevent SQL injection, and SQL syntax and parsing behavior is far more rigorously specified than the shell.

oguz-ismail • 2 days ago

Because I don't want to implement a shell???

1 reply

panzi • 2 days ago

If you want to run a shell script, run a shell script. I.e. a text file with the executable bit set and a shebang. If you want to generate a shell script on the fly to then run it, take a step back and think about what you're doing.