Recently we had a user report a problem where they had managed to start copying the complete filesystem of a compute node into their home directory. Fortunately for us we had setup quotas on our users home space so it stopped when it filled up their quota but it seemed it was all due to a bash variable which was not defined and when the following is performed:
cp -r $NOTDEFINED/* $HOME
$NOTDEFINED is not defined and therefore empty it will instead see:
cp -r /* $HOME
This then copies the whole filesystem to
$HOME. Fortunately for them it was not the following:
rm -fr $NOTDEFINED/*
Since this would try and delete any file the user has access to and write permissions. In the end it was just an annoyance for the user since they had to delete the accidental copy before any further work could be run.
Prevention is better than a cure
So what can be done to avoid those little mistakes which can have a big impact. A work colleague in a previous job showed me some tips (mainly due to someone in the past running the
rm -fr $NOTDEFINED/* example and having write access to many group permissions.
The main tip is to use the shell options
set -eu. The set command in Bash allows the behaviour of Bash to be changed. The
-e option exits on any command which has a non-zero exit code and is not handled in some way. The
-u option errors on any use of an undefined variable. Both of these make your Bash scripts less error prone and forces the writer to think more clearly about their writing style.
Lets look at some examples and how to use them.
- Exit when non-zero exit code is given in a command.
- Exit when undefined variable is used.
WDPATH=/temp/my_work_dir mkdir -p $WDPATH cd $WDPATH echo "my job is running in here" > output.txt cat output.txt pwd
set -e option the program will produce the following:
mkdir: cannot create directory `/temp': Permission denied -bash: cd: /temp/my_work_dir: No such file or directory my job is running in here /home/username
So due to my incorrect directory location it could not make or change to the directory and therefore create a file in the directory it was initially in. This could be bad if you are running a job instead which creates a lot of data. Lets add the
set -e to the top of the script.
mkdir: cannot create directory `/temp': Permission denied
Immediately the script has exited at the command which produced the first error. This is much better.
set -e WDPATH=/tmp/my_work_dir MESSAGE="my job is running in here" mkdir -p $WDPATH cd $WDPATH echo $MESSGE > output.txt cat output.txt pwd
The above script is a similar script but the message is now a variable (and includes
set -e) but I misspelt a variable name (MESSGE rather then MESSAGE). The script instead will produce a blank line. None of the commands failed so what caused the problem? This is where the
set -u can be useful and will instead produce:
test.sh: line 6: MESSGE: unbound variable
Immediately the error is caught and can be fixed. So including it all together we can have the following script:
set -eu WDPATH=/tmp/my_work_dir MESSAGE="my job is running in here" mkdir -p $WDPATH cd $WDPATH echo $MESSAGE > output.txt cat output.txt pwd
Which will produce the correct result – with minimal effort in finding the initial bugs.
A bit more bashing…
I hope this post highlights some useful additions to scripts to make sure errors are caught early and in an obvious fashion. This will minimise the disruption to future runs of the script and also allow others to learn from it when they are passed around (as all scripts tend to do after a while).
In further posts I will cover how to handle commands where a non-zero exit status might be expected but need to be carefully treated when using