Changing optimisation levels causes different application behaviour

August 23, 2017 12:37
Updated

I have a working application but when I change the optimisation level my program behaves differently, what could be going on?

Changing the optimisation level can alter memory layout or application timing and the way that code is compiled!

Missing "volatile"

One of the most valuable concepts in embedded programming is that of volatility. If you don't use the volatile keyword on your data, you're sure to come unstuck when you compile in release mode or with optimizations enabled. The most common form of this is waiting for an event to happen, such as a test on a counter or a flag, where the counter or flag has not been delcared volatile. In this case your code will run in Debug mode but not in Release mode.

Remember to correctly declare objects that can be modified asynchronously as volatile. That is, declare global flags and counters as volatile, declare ring buffer pointers in structures as volatile, and certainly declare any GPIO or special register as volatile.

Altered memory layout

Changing the optimisation level will alter the memory layout, it may use more or less memory.

Common problems caused by change of memory layout are:

Insufficient stack allocated - if the program is using more memory it is possible that the amount of memory allocated from the stack could exceed that allocated to it.
Insufficient heap allocated - if the program is using more memory it is possible that the amount of memory allocated from the heap may exceed that allocated to it.
Highlighting an existing bug - as the memory layout changes it may make the behaviour of an existing bug change or be more visible.

For example, if the program is corrupting a particular area of memory, the area of memory being corrupted may be more or less important as the memory layout changes - in one memory layout it maybe an area of unused RAM in another it may be the stack.

Altered application timing

Changing the optimisation level may produce slower or faster code. If there is a time-related bug in the program such as a race condition the behavior of the program may change with different optimization levels.

Comments

5 comments

Steve

November 09, 2009 11:49
Hi Paul,

thats a fairly sweeping statement about variables that are not declared volatile causing problems. What sort of mechanism causes the problem? Am I to assume that all global variables should be declared as volatile?
0

Comment actions Permalink
Paul Curtis

November 09, 2009 11:58
Not all globals should be declared volatile. You should declare variables volatile if they can, in essence, be changed by hardware or by interrupt routines running asynchronously. "volatile" means "may change outside control of the compiler" and is a flag to the optimizer not to apply various common optimizations to the variable.

More interesting is the question "What does volatile not mean?" In general, using volatile does not **mean:**
- Atomic load, atomic store, atomic modification
- All access to the variable are "as a whole".
- Each read and/or write of the variable must generate code to read and/or write the variable.
If you don't understand what the above points mean, then you don't understand fully what a compiler will or will not do for you.
0

Comment actions Permalink
Steve

November 09, 2009 12:24
You mention "common optimizations" but what are they? How can I avoid the side effects of these optimizations if I don't know what they are? I am beginning to think that I don't know what a compiler can do for me. Somehow I got away with it for 25 years and now it's time to read a book. What would you suggest?
0

Comment actions Permalink
Paul Curtis

November 09, 2009 13:16
There are just so many different optimizations with different names, I'm not sure that telling you what they are is productive. However, you can broadly divide optimizations into control-flow optimizations and dataflow optimizations. Both are applicable to a compiler and volatile.

As an example, consider the following:

int status;

Now, assume for the sake of argument, that status happens to map to a device register and that we want to use status like this:

while (status & 1) { } // wait for bit 0 to be clear

The compiler can legitimately transform this to:

if (status & 1) for (;;) { }

Why? Well, think about it. Because status is not volatile, the compiler can assume that it knows absolutely everything about the way status is accessed and the above transformation is correct. status does not get updated in the while, hence if bit 0 of status is 1, it will always be 1 as it's not changed in the loop. If it's zero, I drop straight through the while.

In this case, sprinkling a volatile onto status means that the compiler cannot make any assumptions on the value of status and therefore its value must be checked each time round the loop.

Another common problem is that of busy-delay loops:

void delay(unsigned us)

{

  int count;

  while (us-- > 0)

   for (count = 0; count < 10; ++count) // tuned for 1us delay

}

This is a nightmare on so many levels, yet you see it everywhere. First off, the compiler may elect to put count in a register or keep it in memory depending upon code generation settings (such as optimization for speed, "in registers", or optimization for code size, "keep in memory"). So, the 1us delay in the loop won't be 1us. That's bad. Secondly, this code does nothing other than waste processor cycles with no affect on the virtual C machine state, so the compiler knows it can eliminate both loops entirely--and it will with aggressive data flow optimizations. If you want to keep the loop, you must declare count volatile so that each time round the loop count is written.

These are two very easy "why I must use volatile" lessons. Now for something more complex.

We know that hardware registers that change outside of our control must be declared volatile; but now, how to use it correctly?

For a start, let's consider quickly filling up a UART's FIFO by writing to the transmit buffer several times:

volatile unsigned char TXBUF; // a device register in my UART

void blast_4_zeroes(void)

{

  TXBUF = TXBUF = TXBUF = TXBUF = 0;

}

Is this right or is this wrong? This is a paradigm for "setting all these variables to zero", e.g. "x = y = z = 0;". You may not write it like this, but just for the moment, let's assume you have because it's legitimate C code.

We expect four writes to TXBUF, yes? Well, in fact no! It happens to be four writes and three reads! Why is that? Well, if you take a look at the C semantics, you find that the value of the expression TXBUF = 0 is the value of TXBUF after the store, not the value that's stored into it! In effect the compiler can rewrite this to:

void blast_4_zeroes(void)

{

  TXBUF = 0;

  TXBUF = TXBUF;

  TXBUF = TXBUF;

  TXBUF = TXBUF;

}

Is that good? No, especially when most UARTs happen to use the same address for both the transmit and receive buffer. In our case, attempting to write four zeroes to the UART has the effect of writing a zero followed by three indeterminate bytes to the transmit buffer and emptying three bytes from the receive buffer! If you don't believe me, try it out!

Do you want me to go on? I could, but you get the picture. You really need to know what you're doing with your favourite language... And I haven't even covered sequence points or the affect of volatile on data storage when using setjmp and longjmp.
0

Comment actions Permalink
Steve

November 09, 2009 16:05
Many thanks for the tutorial professor.
0

Comment actions Permalink

Please sign in to leave a comment.

Articles in this section

I have a working application but when I change the optimisation level my program behaves differently, what could be going on?

Missing "volatile"

Altered application timing