A different UNIX Part II: A better shell language

2006-10-05 13:38

One of the things people study when they "learn unix" is shell scripting and usage. Because every system has a shell, and if you learn to use it interactively, you are half way there to automating system tasks!

Let's consider that for a moment... what are the odds that the same language can be good for interactive use and for programming? I say slim.

Not to mention that learning shell as a way to learn unix is like going to a school that teaches TV production, and studying the remote. While useful, not really the important tool (ok, that analogy doesn't work at all. But it sounds neat, doesn't it?).

The first thing is that today's Linux domination of the unixsphere has caused a serious monoculture in shell scripting: everyone uses bash. The more enlightened ones may check that their scripts work on some other Bourne-style shell.

There are no important distributions (or proprietary unixes) that use a csh or anything like it. Debian has a policy that things should work without bashisms. That's about as good as it gets.

Writing a dozen pages on how shell sucks would be trivial. But uninteresting.

So, let's think it over, and start from the top.

What should a shell scripting language be like?

What doesn't matter?

Let's tackle these things. I invite anyone to add extra ideas in the comments section.

What should a shell scripting language be like?

Interpreted (obvious)
Dynamic typing (you will be switching ints to strs and viceversa all the time).
Easy incorporation of other programs as functions/methods/whatever.

That pretty much is what makes it a shell. ls should be indistinguishable from something written using the shell itself.
Pipes. This is a must. Unix has a bazillion tools meant to be used in command pipelines. You can implement a RDBMS using that kind of thing (check out nosql). Leverage that.

But even here, on its strength, the shell is not perfect. Why can't I easily pipe stderr and stdout to different processes? Why can't I pipe the same thing to two processes at the same time (yes, I know how to do it with a neat trick ;-)
Globbing. *.txt should give you a list of files. This is one of the obvious things where sh is broken. *.txt may be a string or a list, depending on context... and a list is just a series of strings with blanks. That is one of the bazillion things that makes writing shell scripts (at least good ones) hard:
```
[ralsina@monty ralsina]\$ echo *out
a.out
[ralsina@monty ralsina]\$ echo *outa
*outa
```
A list data type. No, writing strings separated with spaces is not ok. Maybe a python-style dictionary as well?
Functions (obvious)
Libraries (and ok, the shell source mechanism seems good enough)
Standalone. It shouldn't spawn sh for any reason ;-)

What doesn't matter?

Performance. Ok, it matters that a five-liner doesn't take 50 minutes unless it has to. But 1 seconds or two seconds? not that important.
Object orientation. I don't see it being too useful. Shell scripts are old-fashioned :-)
Compatibility to current shells. Come on. Why be like something that sucks? ;-)

Now, the example

Let's consider a typical piece of shell script and a rewrite in a more reasonable syntax.

This is bash (no it doesn't work on any other shell, I think):

DAEMONS=( syslog network cron )

# Start daemons
for daemon in "\${DAEMONS[@]}"; do
      if [ "\$daemon" = "\${daemon#!}" ]; then
              if [ "\$daemon" = "\${daemon#@}" ]; then
                      /etc/rc.d/\$daemon start
              else
                      stat_bkgd "Starting \${daemon:1}"
                      (/etc/rc.d/\${daemon:1} start) &>/dev/null &
              fi
      fi
done

And since DAEMONS is something the admin writes, this script lets you shoot in the foot in half a dozen ways, too.

How about this:

DAEMONS=["syslog","network","cron"]

# Start daemons
for daemon in DAEMONS {
      if ( daemon[0] != "!" ) {
              if ( daemon[0] == "@" ) {
                      stat_bkgd ("Starting "+daemon[1:])
                      /etc/rc.d/+daemon[1:] ("start") &> /dev/null &
              } else {
                      /etc/rc.d/+daemon ("start")
              }
      }
}

Of couse the syntax is something I just made up as I was writing, but isn't it nicer already?

Michal / 2006-10-05 18:24:

Your syntax is actually similar to YCP, a scripting language used by SUSE YAST. The language is however not used anywhere else.

Roberto Alsina / 2006-10-05 18:52:

Didn't knew about it.

And here's a classical program, in YCP:

http://www.99-bottles-of-be...

Henry Miller / 2006-10-05 23:05:

This is the wrong approach.

As a dedicated csh user (mostly because when I first started in unix we didn't have so many good choices, it was sh or csh, or find room in your tiny home directory to compile your own - and csh was default), I can state with confidence that ALL shell scripts should be written in bourne. Use your favorite shell for interactive use. Switch to your hearts content. You can even write some person scripts for that shell if you feel like it.

HOWEVER IF IT IS A PUBLIC SHELL SCRIPT IT MUST BE BOURNE! There is no other choice.

If you do not wish to use bourne, then write in python (ruby, tcl, perl, scheme, add a dozen more programing languages and pick your favorite). Beware though that you have just created a requrement that users need to install the language of your choice.

We do not write in bourne because we like it. We write in it because it is a least common denomator that EVERYONE has. Therefore when you write in bourne you are writting something easy for everyone, when you write in anything else you are making it easy for yourself at the expense of forcing your users to install what you want. (This isn't a bad thing, if the script is at all complex you should use something better, but for 100 line scripts - which covers many scripts - bourne works good enough and you avoid the problems of installing other langauges)

Face it, bourne shell will always be with us as the least common enomator. There are many better langauges out there, but none are universial. (I don't have bash on my personal freebsd machine)

skierpage / 2006-10-06 00:41:

Agreed, stick with Bourne for distributed scripts.

Having first-class arrays instead of white-space separated strings is essential; I ran into so many script bugs with Windows and Dreamweaver users accidentally creating "Copy of webpage.html" files on a Web server that I wrote a script just to remove all files with spaces in them (which itself is really hard to get right with xargs quoting).

But the problem just resurfaces when you try to parse program output. ls -l and find dump the spaces in file names, so you're back to trying to figure out where elements in strings begin and end in order to turn them into arrays. At that point it's easier to find a Perl library that does what the command-line tool does but returns hashes.

Microsoft have an interesting idea in Monad (now "PowerShell") outputting objects with keys, so you can unambiguously access "Name" from directory listing output without having to guess word boundaries. You can magically pipe the objects output from one command into another (you can tell I've never used it :-) ). But that's a whole lot of utilities to rewrite.

Roberto Alsina / 2006-10-06 00:43:

No, bourne is not good enough for 100 line scripts. Bourne is not good enough for many 10-line scripts.

Bourne bites you in the ass when you least expect it :-(

This post was pretty much a thought experiment, and a motivation to write a parser (which is something I never did, and want to learn about).

Most traditional languages suck when you try to use them as shell script replacements, because they are not taylored for that use.

Shell is the grandaddy of all domain specific languages, and there is no reason why, say, a distro can't bring another interpreter to the table, and write its own scripts using it.

Axel Liljencrantz / 2006-10-06 13:05:

Hi. Nice article, some interesting ideas. A lot of the features you want exist in a commandline shell called fish, available at http://www.fishshell.org.

As in your article, globbings with no matches do not expand to the original argument.

As in your article, spaces are not used to create extremely fragile 'poor mans arrays'.

In fish, all variables are actually arrays of strings. So the integer data type you wanted does not exist, but the list datatype is really nice. You can use negative indexing to index from the back. You can specify multiple indexes inside a single set of brackets to perform slicing, for example, the first three elements of path are '$PATH[(seq 3)]'. The 'set' command, which is used for all types of variable assignment, allows you to assign to or remove slices of arrays. A cool subcase is that you can treat command substitutions as an array as well. So if you for example want to write the fourth and fift line of the file foo.txt to standard out, simply use 'echo (cat foo.txt)[4 5]'.

Some important unique things fish has that you don't mention:

No implicit subshells. In other shells, if you write a function and use it inside a pipeline, that function will silently be executed in another process. That means that if you function alters a global variable, for example, you simply _can't_ use it in a pipeline. Fish never ever forks of subshells, not even command substitutions.

Universal variables: Universal variables are variables whose value are shared between all the users shells, and the value is preserved across reboots and logouts. This is extremely handy for configuration options, for example. Just change a value, and it will be updated at once, for all shells, and in the future too.

Sane scoping rules. In fish, when you create a new variable inside a function, it will by default be a local variable. The syntax for specifying variable scope in fish is simple, use the switches -U/--universal, -g/--global or -l/--local when using the set builtin to specify the scope you want the variable to live in. Local variables have precedence over global ones, and global ones have precedence over universal ones.

Code validation. The fish syntax is designed so that it can be validated before executing whenever possible. That means a lot more syntax errors are caught early.

Autoloaded functions. Fish has something much nicer than the source command for writing libraries. You can give fish a path list for directories containg definitions of functions, one function in each file, and the file is named after the function. Whenever you need a specific function, fish will automatically load the file, and if the file changes, it will be reloaded. This is needed by fish itself, since fish contains many thousand lines of shellscript, so fish would be a memory hog if it didn't autoload functions. But it also means that you can safely import a large number of libraries without worrying about slow startup or m

Roberto Alsina / 2006-10-06 14:20:

Great stuff about fish. Will have to investigate it :-)

Axel Liljencrantz / 2006-10-06 15:05:

Hi, again. I just noticed my comment got truncated. Sorry about writing so long a post.

The remainder of my post consisted of some more features of fish and an invitation to help in the development.

If you want to make a better shell, I hope you'll join the fish effort by subscribing to the fish mailing list at https://lists.sourceforge.n....

I'll be happy do discuss the design decisions that have been made in fish as well as how fish could be further improved.

Roberto Alsina / 2006-10-06 16:04:

I have been readin the ars technica article, and I really like it.

However, I have a hidden agenda here, which is learning to write parsers.

I don't expect fish is ready to be turned into a python-interpreted language, which is about all I have figured out so far :-)

Thanks for the comments, and while fish may not have won a developer (besides, I am not very good at that ;-) it sure has won a fan.

Kevin / 2006-10-06 19:13:

I second the approach that Microsoft is taking with PowerShell. There are some really nice things in there: passing objects via pipes rather than text as was already mentioned, standard command line argument parsing (yay!), interop with .NET. All good things. It looks a little funky for interactive use, but excellent as scripting glue.

Axel Liljencrantz / 2006-10-07 22:13:

Roberto, if you want to write a parser and make the commandline a better place, and do it in Python, I have _just_ the project for you.

Create a tool that parses man pages and produces command completions from them. You would need a parser, it would be very useful on the commandline, you can do it in a high level language, and there is even a preexisting software called doclifter, written in python, which converts man pages to docbook format.

Ralsina.Me — El sitio web de Roberto Alsina

What should a shell scripting language be like?

What doesn't matter?

Now, the example