Skip to main content

BeginnerDeveloperScriptingSoftware

Bashing those script bugs

20 November 2014

Recently we had a user report a problem where they had managed to start copying the complete filesystem of a compute node into their home directory. Fortunately for us we had setup quotas on our users home space so it stopped when it filled up their quota but it seemed it was all due to a bash variable which was not defined and when the following is performed:

cp -r $NOTDEFINED/* $HOME

and $NOTDEFINED is not defined and therefore empty it will instead see:

cp -r /* $HOME

This then copies the whole filesystem to $HOME. Fortunately for them it was not the following:

rm -fr $NOTDEFINED/*

Since this would try and delete any file the user has access to and write permissions. In the end it was just an annoyance for the user since they had to delete the accidental copy before any further work could be run.

Prevention is better than a cure

So what can be done to avoid those little mistakes which can have a big impact. A work colleague in a previous job showed me some tips (mainly due to someone in the past running the rm -fr $NOTDEFINED/* example and having write access to many group permissions.

The main tip is to use the shell options set -eu. The set command in Bash allows the behaviour of Bash to be changed. The -e option exits on any command which has a non-zero exit code and is not handled in some way. The -u option errors on any use of an undefined variable. Both of these make your Bash scripts less error prone and forces the writer to think more clearly about their writing style.

Lets look at some examples and how to use them.

  1. Exit when non-zero exit code is given in a command.
  2. WDPATH=/temp/my_work_dir
    mkdir -p $WDPATH
    cd $WDPATH
    echo "my job is running in here" > output.txt
    cat output.txt
    pwd
    

    Without the set -e option the program will produce the following:

    mkdir: cannot create directory `/temp': Permission denied
    -bash: cd: /temp/my_work_dir: No such file or directory
    my job is running in here
    /home/username
    

    So due to my incorrect directory location it could not make or change to the directory and therefore create a file in the directory it was initially in. This could be bad if you are running a job instead which creates a lot of data. Lets add the set -e to the top of the script.

    mkdir: cannot create directory `/temp': Permission denied
    

    Immediately the script has exited at the command which produced the first error. This is much better.

  3. Exit when undefined variable is used.
  4. set -e
    WDPATH=/tmp/my_work_dir
    MESSAGE="my job is running in here" 
    mkdir -p $WDPATH
    cd $WDPATH
    echo $MESSGE > output.txt
    cat output.txt
    pwd
    

    The above script is a similar script but the message is now a variable (and includes set -e) but I misspelt a variable name (MESSGE rather then MESSAGE). The script instead will produce a blank line. None of the commands failed so what caused the problem? This is where the set -u can be useful and will instead produce:

    test.sh: line 6: MESSGE: unbound variable
    

    Immediately the error is caught and can be fixed. So including it all together we can have the following script:

    set -eu
    WDPATH=/tmp/my_work_dir
    MESSAGE="my job is running in here"
    mkdir -p $WDPATH
    cd $WDPATH
    echo $MESSAGE > output.txt
    cat output.txt
    pwd
    

    Which will produce the correct result – with minimal effort in finding the initial bugs.

A bit more bashing…

I hope this post highlights some useful additions to scripts to make sure errors are caught early and in an obvious fashion. This will minimise the disruption to future runs of the script and also allow others to learn from it when they are passed around (as all scripts tend to do after a while).

In further posts I will cover how to handle commands where a non-zero exit status might be expected but need to be carefully treated when using set -eu.