Wednesday, 26 March 2008

Frequency list bash function

In addition to command aliases (see an earlier post), you can add your own functions to the bash shell. Here is a simple but useful command line sequence:

function freq() {
sort $* | uniq -c | sort -rn;
}

Put it in ~/.bashrc and you will have a freq command for creating frequency lists:
freq <FILES>
will sort and count all identical lines of the input file(s), and present them in descending frequency. Useful in many situations, not the least for checking that files that are supposed to only contain unique lines actually do so.

(I'm not too sure about bash function syntax, but the function above seems to do its work.)

If you're not familiar with the different commands of the pipeline above, there is plenty to read (e.g., egrep for linguists).

3 comments:

Tristam MacDonald said...

Thanks for that, very handy snippet.

Gabe said...

Cool stuff! What's the "sort $*" for? Thanks!

Nikolaj Lindberg said...

Gabe,

"uniq" wants its input sorted. "sort $*" sorts the lines of all input files. "$*" holds the command line arguments to the script (the input files in this case). Hope this helped.