How Fast Does Your Arduino Code Run?
After one of Uri Shaked's recent tutorials, lots of you asked how to measure the speed of Arduino code. It’s a good question, and it’s an interesting way to dig into how the Arduino compiler works as well! In this post, I’ll show you the tricks I use to measure the speed of any Arduino code.
Measuring With micros(): The Naïve Way
You don’t need any special hardware, just a few extra lines of code. We’ll start with this code snippet:
Let’s measure how long the digitalWrite call takes. Ignore the loop for now, and focus on the digitalWrite line. There are three steps to taking a time measurement:
1. initialize the measurement;
2. take the measurement; and
3. print the result.
One way to initialize the measurement is using the micros function:
long start = micros();
If we know what time something finishes and what time it started, we can work out how long it took. In more technical terms, if we subtract the start time from the finish time, the result is the duration. We can use this to initialize a variable. Let’s call it duration:
long duration = micros() - start;
then print the result:
That will show us a rough duration in microseconds. If we run the code, we can see that the result is 8 microseconds.
Clock Cycles - When Accuracy Matters!
This is the method I used in the past, but as you’ll see in a moment, it’s not very accurate. Arduino clock cycles are a much more precise way to measure the speed of a program.
The Arduino clock, an integral part of the Arduino microcontroller, “ticks” sixteen million times a second. You can think of it as the metronome that orchestrates all the parts of the microcontroller and makes sure everything works in sync. We’ll use one of the Arduino timers to count the number of clock ticks for the measurement.
If you look at Uri shaked's blog post about 5 ways to blink an LED with Arduino, there’s a whole section about timers. Briefly, our Arduino has three different timers numbered 0, 1 and 2. Timers 0 and 2 count from 0 to 255, but timer 1 goes all the way to 65535!
These numbers represent the clock cycles that these timers use to keep track of time. We can change how fast the timers count by setting some configuration bits in one of the control registers. Specifically, we can choose to count every clock cycle, or every 8, 64, 256, or 1024 cycles. Since we want precise measurements, we’ll configure our timer to count every clock cycle.
Taking Advantage of Timer 1
To do this, we’ll get rid of the micros function and set the CS10 bit in the TCCR1B register (that's a special variable that lets us configure the timer).
Since we want to start counting from zero, we set the counter to zero at the beginning, like this:
TCNT1 = 0;
That’s all we need to start counting clock cycles. Now we’ll run the code we want to measure, store the value of this counter in a variable (we'll call it cycles), then print the value of that variable and see how long it took!
unsigned int cycles = TCNT1; serial.println(cycles);
In code, this would look like:
One more thing: if we comment out the test code and run the program, you might expect it to take zero cycles to run (as there is no code). In reality, it takes one cycle because it takes time to copy the counter value into the variable. To allow for this in our measurement, we need to subtract one from the number of cycles.
If we run this again, we can see that it now prints 0, which is correct. Now we’re ready to measure our code!
If we uncomment and run the code, we can see it takes 38 cycles to run. That’s great, but what does it mean?
Remember, the clock “ticks” sixteen million times a second. That means there are sixteen million cycles a second. Since a microsecond is a millionth of a second, if we divide the number of cycles by sixteen, we’ll get the number of microseconds it took the code to run. The problem is, if we divide by sixteen we probably won’t get an integer. We'll change the variable type to float before dividing by sixteen, so we get a precise result:
When we run it again, we can see the actual time it took the function to run: 2.38 microseconds.
Let’s initialize the other timer configuration register to zero, just in case. This ensures there are no configuration bits left over from the Arduino library or bootloader code. When we run it again, do we get the same result?
Yes, it still works! Remember, when we used the micros() function for this measurement, we got about eight microseconds. The difference shows how inaccurate micros() is compared to this method. Now that we’ve finally got this set up, we can start playing around with the code.
PWM: Not all Pins are Born Equal
Let’s see what happens if we use pin 11 instead of the built-in LED (pin 13). Will it still take 2.38 microseconds?
Hmm, it’s slower this time; it takes 3.38 microseconds. Why?
Well, if you look at the Arduino board, you’ll see a little wave next to pin 11. This tells us it’s a PWM pin. digitalWrite() has to run more code for PWM pins, so it takes longer.
DigitalWrite v.s. Direct Port Access
We can also compare digitalWrite() with direct port access. Instead of using digitalWrite, we can write directly to the port registers. For pin number 13, that would be:
If you want to know more about how these port registers work, this video from talofer99 has a great explanation. For now, just remember that this is more-or-less equivalent to digitalWrite(13). Let’s run it and see what happens.
It only took 0.12 microseconds, or two cycles! Remember, the first one using digitalWrite took 38 cycles, so this one is much faster. You’ve probably heard that direct register access is faster, but now you know exactly how much faster!
Surprised by The Arduino Compiler
One interesting use case of this method is the look under the hood of the Arduino compiler. Let’s test a loop that runs a thousand times and does nothing:
When you run the code, you will notice that it finishes running almost instantly. Blame the compiler: it can see that it’s an empty loop, so it ignores this code. If we want the compiler to keep the code, we can use the volatile keyword. It tells the compiler that it must still generate the code that updates the value of the variable i, despite the fact it is not being used.
Now that the compiler actually executes the code, it takes more than 1000 microseconds to run!
delayMicroseconds(), Accurate or not so?
How about the built-in delayMicroseconds() function? Let’s ask it to delay for 10 microseconds and see what happens.
We can see that the delay isn’t particularly accurate. It’s only delaying for 8.81 microseconds, rather than 10. When we try other numbers, it has the same degree of inaccuracy. If we set the delay to 1, the compiler decides the delay is too short and just skips the code altogether (just like we had with the for loop above).
Let’s tweak the code. We’ll add a variable microsecs and set it to zero, then increment it in the setup function. Instead of calling delayMicroseconds(1), we’ll call it with the variable microsecs instead. microsecs has the same value, but the compiler can't figure out that the value is actually 1, so it won’t optimize the code.
As you can see, this time it actually generates a delay of 0.5 microseconds. I find it fascinating to dig into the compiler’s optimization tricks that make our code run faster. This method lets us look under the hood and see what actually happens when we run our code on Arduino.
Now lets try to measure the accuracy of delay(). This function pauses the program for a given number of milliseconds (1 milliseconds = 1000 microseconds).
Not bad. It should have been 1000 microseconds, but we have four extra microseconds. If we go up to 2, we can see it’s about 2000 microseconds, so it’s pretty accurate. If we keep going up, we’ll soon go over the limit of the unsigned variable, which is 65535. That means we can go up to about 4 microseconds. There are ways to overcome these limits, such as using the timer overflow bits, but that's already a topic for another post.
Takeaway: Your Assumptions Are Wrong (Usually)
We’ve seen two different ways to measure how fast your code runs. Usually the speed of the code doesn't matter that much, but when it does it really comes in handy to know how to measure it. It’s also fun to tinker with the code, try different things, and see how much they affect the execution speed and how the compiler optimizes it.
Hopefully, this inspires you to look into your code, explore and challenge your assumptions about how your code runs. Get started, and see what you can find!