Tuesday, January 12, 2010

Why does sort have a -o flag?

Most Unix/Linux commands use the ancient convention of standard input, standard output, and standard error. In general, when a command needs input, it reads from standard input, and when a command produces output, it writes to standard output. The user can specify the source of the input and the target of the output using the redirection operators < and >, as in:


grep 'xxx' < /tmp/myFile.txt > /tmp/output.txt


Many commands follow this pattern: cat, grep, sed, awk, uniq etc.

There are a few special commands which use the -o flag to specify where to put their output: cc, as, and ld in particular. Note that these commands:

  • produce binary output, which makes no sense to be displayed in a terminal window (and if you try, usually corrupts your terminal session)

  • put their output into a specially-named file (a.out) if the -o flag is not specified, not to stdout



Given all this, the sort command is really weird, as it has an optional -o flag which specifies where to put the output, and if the -o flag is not specified sort sends its output to standard output.

I can't see any reason why sort has this flag, when the > redirection operator works just fine, and is used for this purpose in almost every other Unix command I know.

Does anybody know why sort has a -o flag, what purpose that flag serves that the output redirection operator doesn't serve, or whether there are any other Unix commands which follow this pattern?

Update: My co-worker Joe says that the sort command's implementation of -o is very special, because it is legal to specify the same file as both the output file and the input file. Apparently, the sort command, when processing its output using the -o argument, will write all the output to a file with a temporary name (thus not overwriting the input during the sort), and then at the end of the sort, it will rename the temporary file to the name given in the -o argument, thus allowing you to successfully sort a file back into itself. It's still not clear to me why sort is the only command which decided to have this behavior.

2 comments:

  1. Interesting question!

    I can't think of many *nix utilities that produce an output file which can "replace" the original file.

    I.e. when grep'ing, cut'ing, etc. the output file contains a subset of the input, so you probably want to keep the input file.

    sort is a bit different, as (with the default options) it produces a file which in most cases (those in which the original order of the file is not "useful") can replace the original file.

    I can't think of other commands like this, except iconv (not in the traditional chain), which does provide the -o switch.

    ReplyDelete
  2. Oh, old fart moment. The -o switch reduces the amount of storage required from 3*O(N) to 2*O(N). Absent that switch the program needs to allocate O(N) for it's working copy.
    - ben (who once got take aside for a talking to about using pipes rather than temp files on the pdp-11)

    ReplyDelete