Changing optimisation levels causes different application behaviour

Follow

Comments

5 comments

  • Avatar
    Steve

    Hi Paul,

    thats a fairly sweeping statement about variables that are not declared volatile causing problems. What sort of mechanism causes the problem? Am I to assume that all global variables should be declared as volatile?

  • Avatar
    Paul Curtis

    Not all globals should be declared volatile.  You should declare variables volatile if they can, in essence, be changed by hardware or by interrupt routines running asynchronously.  "volatile" means "may change outside control of the compiler" and is a flag to the optimizer not to apply various common optimizations to the variable.

    More interesting is the question "What does volatile not mean?"  In general, using volatile does not **mean:**

    • Atomic load, atomic store, atomic modification
    • All access to the variable are "as a whole".
    • Each read and/or write of the variable must generate code to read and/or write the variable.

    If you don't understand what the above points mean, then you don't understand fully what a compiler will or will not do for you.

  • Avatar
    Steve

    You mention "common optimizations" but what are they? How can I avoid the side effects of these optimizations if I don't know what they are? I am beginning to think that I don't know what a compiler can do for me.  Somehow I  got away with it for 25 years and now it's time to read a book. What would you suggest?

  • Avatar
    Paul Curtis

    There are just so many different optimizations with different names, I'm not sure that telling you what they are is productive.  However, you can broadly divide optimizations into control-flow optimizations and dataflow optimizations.  Both are applicable to a compiler and volatile.

    As an example, consider the following:

    int status;

    Now, assume for the sake of argument, that status happens to map to a device register and that we want to use status like this:

    while (status & 1)  { }   // wait for bit 0 to be clear

    The compiler can legitimately transform this to:

    if (status & 1)  for (;;) { }

    Why?  Well, think about it.  Because status is not volatile, the compiler can assume that it knows absolutely everything about the way status is accessed and the above transformation is correct.  status does not get updated in the while, hence if bit 0 of status is 1, it will always be 1 as it's not changed in the loop.  If it's zero, I drop straight through the while.

    In this case, sprinkling a volatile onto status means that the compiler cannot make any assumptions on the value of status and therefore its value must be checked each time round the loop.

    Another common problem is that of busy-delay loops:

    void delay(unsigned us)

    {

      int count;

      while (us-- > 0)

        for (count = 0; count < 10; ++count)  // tuned for 1us delay

    }

    This is a nightmare on so many levels, yet you see it everywhere.  First off, the compiler may elect to put count in a register or keep it in memory depending upon code generation settings (such as optimization for speed, "in registers", or optimization for code size, "keep in memory").  So, the 1us delay in the loop won't be 1us.  That's bad.  Secondly, this code does nothing other than waste processor cycles with no affect on the virtual C machine state, so the compiler knows it can eliminate both loops entirely--and it will with aggressive data flow optimizations.  If you want to keep the loop, you must declare count volatile so that each time round the loop count is written.

    These are two very easy "why I must use volatile" lessons.  Now for something more complex.

    We know that hardware registers that change outside of our control must be declared volatile; but now, how to use it correctly?

    For a start, let's consider quickly filling up a UART's FIFO by writing to the transmit buffer several times:

    volatile unsigned char TXBUF;   // a device register in my UART

    void blast_4_zeroes(void)

    {

      TXBUF = TXBUF = TXBUF = TXBUF = 0;

    }

    Is this right or is this wrong?  This is a paradigm for "setting all these variables to zero", e.g. "x = y = z = 0;".  You may not write it like this, but just for the moment, let's assume you have because it's legitimate C code.

    We expect four writes to TXBUF, yes?  Well, in fact no!  It happens to be four writes and three reads!  Why is that?  Well, if you take a look at the C semantics, you find that the value of the expression TXBUF = 0 is the value of TXBUF after the store, not the value that's stored into it!  In effect the compiler can rewrite this to:

    void blast_4_zeroes(void)

    {

      TXBUF = 0;

      TXBUF = TXBUF;

      TXBUF = TXBUF;

      TXBUF = TXBUF;

    }

    Is that good?  No, especially when most UARTs happen to use the same address for both the transmit and receive buffer.  In our case, attempting to write four zeroes to the UART has the effect of writing a zero followed by three indeterminate bytes to the transmit buffer and emptying three bytes from the receive buffer!  If you don't believe me, try it out!

    Do you want me to go on?  I could, but you get the picture.  You really need to know what you're doing with your favourite language...  And I haven't even covered sequence points or the affect of volatile on data storage when using setjmp and longjmp.

     

  • Avatar
    Steve

    Many thanks for the tutorial professor. 

Please sign in to leave a comment.