Problems with errexit in bash

When I create a bash script, I want it to be robust. For a long time, the errexit option is used in a lot of my scripts. It makes the code a lot easier to read and maintain.

A while back, I was creating a script for checks that I wanted to run on the system. During writing test cases for this script, I hit a strange problem. All of the sudden, it seemed that the errexit didn’t work anymore. I could reproduce the problem fatefully.

In the called function, I rely on errexit to fail everything when something goes wrong. Calling this function directly worked great.

However, calling it as part of an or chain (||)  or as argument in an if construction disabled the errexit within the called function. Strange. The way the function is called is thus determining how the function works. Weird and very unexpected ! It’s almost as if the shape of an orange changes when you put it on the table instead of in a bowl. Hey, quantum physics in bash 😉 !

Searching the internet turned up a stackoverflow problem and a post to the bash mailing list with an explanation. The recommendation not to use set -e doesn’t work well with me. It was finally an option that gave some sanity to bash programming (or so I thought).

A lot of experimentation later, I found a workaround.

First, what didn’t work:

  • Using set -E or other options didn’t solve the problem.
  • I wasn’t able to get a working solution using traps (ERR or DEBUG or EXIT).

What did work, was calling the function in the background and waiting until it finished. That way, the wait command could be used in the context where the function was used, but the code that disabled errexit was not triggered !

You can find some examples in the stackoverflow post.

I can only draw two conclusions from this episode:

  • Don’t use bash for very complex things as there are hidden snags.
  • Hurray for testing the code since this gave me the opportunity to learn about and work around one of those snags.

PS: my opinion on the whole problem ? I don’t see a big problem in not following POSIX in this case. Bash is doing it in several other areas so why not here. Especially because it makes such an interesting option a nightmare, hmm, pain to use.


Leave a Reply