FAQ
TL;DR: On AVR, MUL is 2 cycles, while 16‑bit fixed‑point divide is ~360 cycles worst‑case [AVR Instruction Set Manual; Elektroda, Anonymous, #21621291]. "Find out how precise you will have to be." [Elektroda, Anonymous, post #21621293] This FAQ shows faster division, sqrt, and trig for UAV control loops on 8‑bit AVR using fixed‑point and approximations.
Why it matters: Faster math keeps your control loop stable at higher update rates without adding a costly FPU MCU.
- AVR 8‑bit MCUs have hardware 8×8 MUL (2 cycles) and no hardware DIV/FPU [AVR Instruction Set Manual].
- Reported float __divsf3 ≈ 465 cycles; sqrt ≈ 492 cycles (software) [Elektroda, Anonymous, post #21621283]
- Measured 8.8 fixed‑point: mul ≈ 40 cycles; div ≈ 360 cycles worst‑case [Elektroda, Anonymous, post #21621291]
- Float sin/cos measured at 1600+ cycles on the target MCU [Elektroda, Anonymous, post #21621285]
- Lookup tables: 8‑bit→8‑bit costs 256 bytes; 12‑bit→12‑bit ≈ 1536 bytes [Elektroda, Anonymous, post #21621294]
Quick Facts
- AVR 8‑bit devices: hardware 8×8 MUL (2 cycles), no hardware DIV/FPU [AVR Instruction Set Manual].
- Reported float divide __divsf3 ≈ 465 cycles; float sqrt ≈ 492 cycles [Elektroda, Anonymous, post #21621283]
- Fixed‑point 16‑bit (Q8.8) implementation: mul ≈ 40 cycles, div ≈ 360 cycles worst‑case [Elektroda, Anonymous, post #21621291]
- Float sin/cos calls can take 1600+ cycles on the referenced platform [Elektroda, Anonymous, post #21621285]
- Memory tradeoff: 256‑entry LUT = 256 bytes; 12‑bit case ≈ 1536 bytes [Elektroda, Anonymous, post #21621294]
How can I do fast division on AVR 8‑bit without hardware divide?
Avoid float. Use fixed‑point and compute a reciprocal via Newton–Raphson: y ≈ 1/x, then multiply by y. This uses only adds, shifts, and MUL, which AVR accelerates [AVR Instruction Set Manual]. A Q8.8 library measured ~360 cycles worst‑case for 16‑bit divide [Elektroda, Anonymous, post #21621291] Expert tip: "Use fixed‑point and approximations" for speed [Elektroda, Anonymous, post #21621289] Libraries like avrfix and Cornell’s routines provide working code paths [Elektroda, Anonymous, #21621289; Elektroda, Anonymous, #21621291].
Should I avoid float in UAV control loops on AVR?
Yes. Float division and sqrt run in software and cost hundreds of cycles, reducing loop bandwidth [Elektroda, Anonymous, post #21621283] Fixed‑point keeps operations to integer math, with measured 16‑bit mul ≈ 40 cycles and div ≈ 360 cycles worst‑case [Elektroda, Anonymous, post #21621291] Many control systems fit within 16‑bit fractional ranges if you scale signals appropriately [Math Toolkit for Real-Time Programming].
Is __divsf3 really that slow on AVR?
It is software‑emulated and reported at ≈465 cycles in the referenced setup [Elektroda, Anonymous, post #21621283] Actual cost varies with compiler version and optimization level because there is no hardware divide or FPU [avr-libc User Manual; AVR Instruction Set Manual]. For time‑critical code, convert to fixed‑point division via reciprocal iterations [Elektroda, Anonymous, post #21621291]
Whats a fast way to compute sqrt(x) on AVR?
Use fixed‑point Newton–Raphson or a binary restoring integer sqrt. A forum implementation provides 16‑bit reciprocal and sqrt for ATmega644 in GCC [Elektroda, Anonymous, post #21621292] Float sqrt was reported around 492 cycles in the thread’s context [Elektroda, Anonymous, post #21621283] Fixed‑point variants avoid float overhead and let you control scaling and saturation [Math Toolkit for Real-Time Programming].
How do I implement a fast 1/x using NewtonRaphson in fixed‑point?
- Get an initial guess y0 from a small LUT or bit trick. 2. Iterate y_{k+1} = y_k(2 − xy_k), rescaled in Q‑format. 3. Clamp/saturate to handle near‑zero inputs. This converges quadratically with a good initial guess [Newton's method]. AVR MUL helps, as integer multiplies are fast [AVR Instruction Set Manual].
Can I replace sin/cos with faster approximations on AVR?
Yes. For small angles, use a Taylor series truncated to a few terms, using only adds and multiplies [Elektroda, Anonymous, post #21621285] For broader ranges, use LUTs with piecewise linear interpolation to meet your error budget [Elektroda, Anonymous, post #21621295] Eight‑bit precision can be ~1.4% error in sine/cosine per the thread [Elektroda, Anonymous, post #21621293]
Is CORDIC worth it for trig on 8‑bit AVR?
CORDIC uses only shifts and adds, making it attractive when MUL is costly or absent. AVR has fast MUL, but CORDIC still avoids float and can be tuned for precision [CORDIC]. You must do range reduction and manage scaling factors, or accuracy suffers [CORDIC]. Consider LUT+interpolation if memory allows [Elektroda, Anonymous, post #21621295]
How big should lookup tables be for trig or sqrt, and whats the tradeoff?
Match table resolution to required precision. An 8‑bit in/out table costs 256 bytes; a 12‑bit mapping costs ~1536 bytes [Elektroda, Anonymous, post #21621294] You can reduce size using nonuniform breakpoints with linear interpolation between entries, saving memory with minimal extra math [Elektroda, Anonymous, post #21621295]
Which fixed‑point format should I use (Q8.8 vs Q1.15)?
Q8.8 is simple and integrates with 16‑bit math on AVR, with tested routines available [Elektroda, Anonymous, post #21621291] Q1.15 gives higher fractional precision for values in −1..1, useful for normalized vectors and filters [Math Toolkit for Real-Time Programming]. Choose the format that fits your signal ranges and avoids overflow [Math Toolkit for Real-Time Programming].
Any ready‑made fixed‑point math libraries for AVR?
Yes. See avrfix on SourceForge for fixed‑point fractional functions [Elektroda, Anonymous, post #21621289] Cornell’s ECE4760 page publishes Q8.8 multiply, divide, reciprocal, and sqrt code for AVR‑GCC [Elektroda, Anonymous, #21621291; Elektroda, Anonymous, #21621292]. These save you from writing assembly and offer known cycle counts [Elektroda, Anonymous, post #21621291]
How many cycles does 16‑bit fixed‑point multiply/divide take on AVR?
One published Q8.8 implementation measured multiply ≈ 40 cycles and divide ≈ 360 cycles worst‑case [Elektroda, Anonymous, post #21621291] These avoid software floating‑point overhead while using hardware MUL [AVR Instruction Set Manual]. Use them to size control‑loop budgets alongside sensor and I/O costs [Elektroda, Anonymous, post #21621291]
Common pitfalls with NewtonRaphson for reciprocal or sqrt on fixed‑point?
Poor initial guesses can slow convergence or diverge; clamp domains and precondition inputs [Newton's method]. Scale to avoid overflow during intermediate multiplies, especially for values near zero [Math Toolkit for Real-Time Programming]. Saturate outputs and check for division by zero before iteration [Math Toolkit for Real-Time Programming].
Do AVR 8‑bit MCUs have any hardware help for division?
No. Classic 8‑bit AVR provides an 8×8 hardware multiplier but no divide instruction and no FPU [AVR Instruction Set Manual]. Therefore, C float division and sqrt compile into software routines like divsf3 and mulsf3 via libgcc/avr‑libc [avr-libc User Manual].
Where can I learn more math tricks for small micros?
See HAKMEM for classic bit‑level and numeric hacks [HAKMEM]. Jack Crenshaw’s “Math Toolkit for Real‑Time Programming” dives into fixed‑point and numerics for embedded systems [Elektroda, Anonymous, post #21621290] The Cornell ECE4760 math page includes AVR‑GCC examples with cycle notes [Elektroda, Anonymous, #21621291; Elektroda, Anonymous, #21621292].