Fast Division and Square Root Algorithms for AVR 8 Bit Flight Control Applications

42 12

#1 21657643 04 Dec 2010 22:38

Edward Henderson Edward Henderson

Anonymous

Post #1
21657643 04 Dec 2010 22:38

I have an digital board intended for a Flight control system on RC aircraft, with grand plans for a home built UAV. I have been doing a lot of study of the math and algorithms that go into such systems. Some of the discussions cover efficiency of the algorithms, in terms of how many multiply operations and addition operations are required for each update time. The AVR 8 bit devices have a built in hardware multiply operation, in 8 bit. 16 bit operations require a couple of cycles. What they do NOT have is a divide operation. A divide is the same a multiplication by the reciprocal.. but how do you get the reciprocal? The other operation that is not represented is the square root.

The avr-libc library includes divide and sqrt functions, but they are fairly costly according to the documentation. The __divsf3 function, which I assume is the divide (but documentation is limited here), takes 465 clock cycles. That is about 24us at 20Mhz, with 1 cycle per clock (I think that is correct).. That seems like an awfully long time to perform a division.. especially if a mul operation is 1 cycle! sqrt takes 492 cycles, which is also pretty high. For some reason, I would expect sqrt to take more.

I'm sure I can dig into the details of these routines, but if anyone has some insight, that would be great. Perhaps if I knew a bit more about how these particular routines work, I might be able to optimize a version specifically for my task. I'm sure that the avr-libc versions are somewhat generic, so, making one that is specific to a single purpose might save some of those cycles. Anyone have any experience in this area?
ADVERTISEMENT
#2 21657644 07 Dec 2010 19:32

Joe Wolin Joe Wolin

Anonymous

Post #2
21657644 07 Dec 2010 19:32

The folks on the forum at http://www.avrfreaks.net might have a good answer.

Let us know what you find out...
#3 21657645 09 Dec 2010 12:09

Edward Henderson Edward Henderson

Anonymous

Post #3
21657645 09 Dec 2010 12:09

AVR Freaks might help, but I was thinking my question was more of a general microprocessor question. The question is more what types of shortcuts can one make with a small micro.

I was discussing transcendental functions (sin, cos, tran) with a friend recently. The processor I am using takes 1600+ clock cycles for a sin or cos calculation. The calculations are done in floating point.

For small angles, a taylor series expansion can be used instead. This expansion requires only multiplication and addition, which have direct hardware support. While this isn't a full sin/cos function, for some applications it might work quite well (small angles), and the performance will be significantly better.

Those are the types of optimization I am looking for.
ADVERTISEMENT
#4 21657646 09 Dec 2010 12:47

Joe Wolin Joe Wolin

Anonymous

Post #4
21657646 09 Dec 2010 12:47

Thanks for the update. A Taylor Series does sound like the way to go.
#5 21657647 17 Dec 2010 02:34

Randy Dawson Randy Dawson

Anonymous

Post #5
21657647 17 Dec 2010 02:34

Ed, I'm going to suggest HACKMEM as a note you might enjoy.

Its a bit dated MIT paper (1972), but lots of mathematical gems in there.

Just plain interesting reading, too!
ADVERTISEMENT
#6 21657648 17 Dec 2010 20:07

Joe Wolin Joe Wolin

Anonymous

Post #6
21657648 17 Dec 2010 20:07

Here's a link to the paper Randy is talking about. Thanks Randy.

http://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html
#7 21657649 19 Dec 2010 04:21

Bill Westfield Bill Westfield

Anonymous

Post #7
21657649 19 Dec 2010 04:21

The internal multiply instruction you talk about is for 8 bit integers, while the divsf3 and sqrt functions are for 32bit FLOATING POINT numbers, which is a lot different.

For high speed math as you might need in a flight control system, you probably want to look for a library of high-speed FIXED POINT FRACTIONAL math functions. This is nicely searchable. While I don't have any direct experience, there is this: https://sourceforge.net/projects/avrfix/
#8 21657650 04 Jan 2011 12:34

Gary Crowell Gary Crowell

Anonymous

Post #8
21657650 04 Jan 2011 12:34

Hi Ed, I have a book titled "Math Toolkit for Real-Time Programming" by Jack Crenshaw. It may not be everything you're looking for, but it's interesting to read anyway. The author did the 'Programmer's Toolbox' column for Embedded Systems Programming magazine for many years, and his columns are the main reason I've kept most of the back issues.

I'm sure we'll be seeing each other this week, so I'll bring it along.
#9 21657651 17 Feb 2011 10:51

Bruce Land Bruce Land

Anonymous

Post #9
21657651 17 Feb 2011 10:51

I wrote some 16 bit fixed point routines for AVRGCC. The format is 8:8. Multiply speed is about 40 cycles, divide speed is about 360 cycles worst case. Code is at
http://people.ece.cornell.edu/land/courses/ece4760/Math/index.html
ADVERTISEMENT
#10 21657652 11 Oct 2011 11:41

Bruce Land Bruce Land

Anonymous

Post #10
21657652 11 Oct 2011 11:41

I wrote a simple 16-bit reciprocal and sqrt for Mega644 in GCC.
http://people.ece.cornell.edu/land/courses/ece4760/Math/index.html
#11 21657653 16 Oct 2011 05:25

Per Zackrisson Per Zackrisson

Anonymous

Post #11
21657653 16 Oct 2011 05:25

As a rule in realtime applications:
Find out how precise you will have to be.
eight bit gives you a 1.4% error in sinus or cosinus.
Is that not enough try 16 bits and so on.
You can also do calculations beforehand and put in a table. Exchange memory for speed.
#12 21657654 18 Oct 2011 09:41

Ralph Pruitt Ralph Pruitt

Anonymous

Post #12
21657654 18 Oct 2011 09:41

Hi Edward,

Let me add one more item to this discussion. In Microcontrollers there always is a tradeoff of speed verses use of memory. At the most basic level you can write the routine and have the low speed microcontroller attempt to do the calculations as fast as an algorithm can be coded. The other solution is to code all or part of the routine using lookup tables where the calculations have been previously performed and placed into the table. The only negative will be your resolution. If this is a simple 8 bit result that is passed an 8 bit value this uses 256 bytes of EEPROM, else if it is a 12 bit result from a 12 bit parameter that can use 1536 bytes. The key is to attempt to only use the resolution that you need. Further, with tables they can be used along with the algorithm for key sections so the solution can be a hybrid.

Just some ideas to consider when using low precessing power micros.
#13 21657655 18 Oct 2011 10:14

Todd Hayden Todd Hayden

Anonymous

Post #13
21657655 18 Oct 2011 10:14

Good points.

Another thing I have done to avoid consuming memory space with a 1:1 table is to use piecewise linear approximations to the function. Select the table entries so that interpolation between two entries still gives the resolution needed. The table size can sometimes be reduced dramatically for a small increase in processing of the value.
Create an account, log in here. You will receive points by participating in discussions.
Join this discussion.

Install Elektroda application

Didn't find an answer? Ask Artificial Intelligence

*I agree to send the question to OpenAI, Anthropic PBC, Perplexity AI, Inc., Kagi Inc., Google LLC - owners of language models in order to prepare the best response. The companies may monitor and log information entered into the form.

*I agree to publicly display my question and answer. The question and answer will be publicly available to everyone. The process may take a few minutes. Upon completion, you will be redirected to the page with the answer.

Wait...(2min)

Topic summary

The discussion addresses efficient implementation of division and square root operations on AVR 8-bit microcontrollers for flight control systems in RC aircraft and UAVs. AVR devices feature an 8-bit hardware multiply instruction but lack hardware divide and square root operations, with floating-point divide (__divsf3) and sqrt functions in avr-libc being computationally expensive (e.g., 465 clock cycles at 20 MHz). To optimize performance, fixed-point arithmetic libraries such as AVRfix and custom 16-bit fixed-point routines (e.g., 8:8 format) are recommended, offering faster multiply (~40 cycles) and divide (~360 cycles) operations. Approaches to reduce computational load include using Taylor series expansions for transcendental functions (sin, cos) at small angles, lookup tables for precomputed values, and piecewise linear approximations to balance memory usage and speed. The tradeoff between precision, speed, and memory footprint is emphasized, with suggestions to tailor resolution to application needs. References include the HACKMEM paper for mathematical algorithms and resources with example code for fixed-point reciprocal and square root implementations on AVR Mega644.
Summary generated by the language model.

Fast Division and Square Root Algorithms for AVR 8 Bit Flight Control Applications

Didn't find an answer? Ask Artificial Intelligence

Topic summary