logo elektroda
logo elektroda
X
logo elektroda

Control of WS2812 diodes via SPI with DMA - use of MOSI for timing generation

p.kaczmarek2  5 2367 Cool? (+4)
📢 Listen (AI):

TL;DR

  • A BK7231-based WS2812 LED strip driver uses hardware SPI and DMA to generate LED timing through the MOSI pin.
  • Each WS2812 bit is encoded into four SPI bits, mapping 0 to 0b1000 and 1 to 0b1110, then expanded with a lookup table.
  • The SPI clock runs at 3 MHz, giving about 0.333 microseconds per SPI bit and turning one WS2812 byte into four SPI bytes.
  • The method sends large LED data blocks without blocking execution, so animations can play while DMA handles transmission in the background.
  • The waveforms are not perfectly precise, so the transfer needed extra zero bytes at the beginning and end for stability.
Generated by the language model.
Screenshot of an oscilloscope with a signal waveform. .
A few years ago I presented a simple way to control an individually addressable LED strip based on bit-banging , i.e. the simplest operations on the IO pins of a microcontroller. Here I will show a completely different approach, which is based on using the MOSI pin from the hardware SPI port in combination with direct memory access (DMA). The WS2812 itself does not support the SPI protocol so this may seem strange, but I will explain everything in a moment.... the result is a WS2812 driver capable of sending large amounts of data to LEDs without blocking code execution by the microcontroller, i.e. also suitable for animation playback.

Basics of the WS2812 and the bit-bang method on the PIC18F45K50 .
I described the WS2812 communication protocol in the previous section. There I also showed the simplest implementation of it:
PIC18F45K50 as WS2812 LED strip driver (theory+library) .
This is where I assume familiarity with this topic.

I will pursue this topic on BK7231 .

Motivation to change control .
It would seem that nothing more is needed - bit bang is enough. But are we sure? We have to disable interrupts for the duration of the data transfer in this way, and we block code execution every time until the data transfer is complete. Nothing else can be done during this, otherwise we will mess up the timings. With long LED bars and animations this can take a really long time.
One would like to use interrupts, but the times required are really small:
Table showing data transfer times for the WS2812 protocol. .
This might make you look for another way. However, there is another reason that practically forces this search.
I have attempted a method based on bit-banging on the BK7231. I experimentally determined how long one cycle of instructions lasts on this platform - one NOP - and selected their quantities so that I would have delays of, say, 0.4us and 0.8us.
Code: C / C++
Log in, to see the code
.
I then typed in the code with preprocessor instructions creating a one or zero pulse:
Code: C / C++
Log in, to see the code
.
I also optimised the pin setting and included it in the calculation, this I omit.
Finally, I tried sending pulses with them:
Code: C / C++
Log in, to see the code
.
I fired it up, and:
Screenshot of an oscilloscope with a signal waveform. .
Everything looked fine, there are 8 pulses, but when I changed the number of commands "nop":
Code snippet in a text editor with asm macros for SLEEP functions. .
There was an unexpected, even (apparently) non-deterministic delay and only in part of the pulses? But why? After all, is it eight times WS2812_SEND_T0?
Oscilloscope graph showing signal pulses for WS2812 LED control. .
I was a bit puzzled by this, but eventually I began to suspect that the problem was the cache and the way the instructions were being read from flash memory. Probably executing instructions from RAM (called "ramfunc") would also help, but there were difficulties with that too and we eventually decided on the method with SPI....


Basics of SPI DMA? .
Protocol SPI, to simplify, is based on two pins - clock (CLK) and data (MOSI - master out, slave in). Depending on the SPI mode, data is loaded by the receiver at a given clock edge. Based on 'pin waving', bit-bang, it looks roughly like this:
Code: C / C++
Log in, to see the code
.
There's also the SS (Slave Select) pin, but I'm skipping that...
That's how you can send data from any digital outputs, but it's not efficient and again, it blocks code execution.
Many platforms offer more efficient solutions - hardware SPI and sometimes even with DMA (Direct Memory Access).
The platform used here, the BK7231, has one hardware SPI port:
Details of the BK7231T schematic with marked SPI pins. .
Using it, the display can be efficiently controlled, for example. But that leaves us with the DMA - what does it give us?
DMA, as the name suggests, provides the SPI hardware driver with direct access to memory. So essentially we have:
Code: C / C++
Log in, to see the code
.
This piece of code will initiate the transfer of an array of bytes through the DMA at a given rate, and the transfer itself will not block the execution of the rest of the instruction, but will take place in the background, fully automatically.
The whole thing will no longer be affected by any interrupts or there instructions, the timings will be perfect....
Only what does this give us with respect to the WS2812, which expects control of a single pin? .


Using SPI DMA to control the WS2812 .
The whole trick is to use the independence of the SPI timing to simulate the signal required by the WS2812. Just how do you do this when they are two different protocols?
With SPI we have two signals:
- CLK/SCK, the clock, he can rather not how to carry data with him
- MOSI, data, their state (high or low) depends on what we want to send....
We can make a preliminary bet here that manipulation of the MOSI signal will allow us to generate the waveforms expected by the WS2812.
Now we just need to figure out how to manipulate this signal.
Let's recall the requirements:
Table showing data transfer times for the WS2812 protocol. .
We want to be able to generate each of these signals. So I guess the finest unit of time it needs to have is 0.4us, then simply to generate 0.8us we would use it twice. Alternatively you could also try with 0.3us or similar.
What SPI transfer rate do we need to have to make one bit (from MOSI) last 0.3us?
1s/0.3us = 3,333,333
So let's take maybe 3,000,000, which is an SPI of 3MHz.
Then one bit takes about 1s/300000 = 0.333us
Two bits, on the other hand, will take about 0.666us . Apparently it's still within the margin of error, although I think I'll consider the option with three bits - 0.999us.
At this point we already know that we can generate T0H by sending one bit in the high state via SPI, and similarly, T0L will be two bits in the low state. So a total of 3 bits.
That is to say:
- time T0H is one bit lit in SPI
- time T0L is two bits off in the SPI
- time T1H is two bits lit in the SPI
- time T1L is one bit off in the SPI
That is:
- if you want to send bit 0 to the WS2812, you send T0H and T0L via SPI, i.e. 011
- to send bit 1 to the WS2812, we send T1H and T1L via SPI, that is 110
Rightly perfect, except that TH+TL must equal 1.25us, and 3*0.333us = 0.999us. It would be better to give 4*0.333us=1.332us, the error to 1.25us would be smaller. For this reason, we make a correction - we stretch one bit from the WS2812 to four bits from the SPI. .
Final:
- 0 for WS2812B is 0b1000 via SPI
- 1 for WS2812B is 0b1110 via SPI
It's just that the byte is 8 bits long, so it's easier to translate for us in binary:
- 0 0 for WS2812B is 0b1000 0b1000 via SPI
- 0 1 for WS2812B is 0b1000 0b1110 via SPI
- 1 0 for WS2812B is 0b1110 0b1000 via SPI
- 1 1 for WS2812B is 0b1110 0b1110 via SPI
A look up array can then be used:
Code: C / C++
Log in, to see the code
.
The logical product with 0b11 is to put out bits other than the two least significant bits. This will ensure that the array index will be from 0 to 3 inclusive.
It is now left to break one byte for the WS2812 into four pairs of two bits each, and to write the finished bytes into the data array for the SPI:
Code: C / C++
Log in, to see the code
.
And basically. that's it.
Now if we have some colour array to send to the WS2812, we simply "translate" it as shown into an array of bytes to be sent via SPI of an appropriately sized array (one byte for the WS2812 is 4 bytes on SPI) and then send the data so created via SPI DMA.
The rest of the code already depends on the platform we are using. .
The RESET condition for the W2812 could still be mentioned here, but sending the low state alone with the logic shown should no longer be a problem.
Close-up of the inside of a circular LED lamp with colorful lighting. .

Summary .
It would seem that the waveforms generated in this way will not be precise enough, but it seems that the WS2812 can tolerate them after all - I checked this in the topic about lamp from Action . You can safely play animations, indeed not only, because the once mentioned intelligent drawers also used this way. Such a bar can also be controlled via the DDP protocol.
Finally, if anyone wants to take a look at the source code, the project is open source . So far I'm happy with the implementation, so I'm unlikely to improve it, although there were a bit more complications with it, if only to increase the stability of the transfer we had to add a bit of zero bytes to the beginning and end, but that's rather due to the platform used already....
In any case - the animations work. And what are your experiences, have any of you tried writing your own WS2812 driver?

About Author
p.kaczmarek2
p.kaczmarek2 wrote 14408 posts with rating 12345 , helped 650 times. Been with us since 2014 year.

Comments

krzbor 13 Mar 2025 23:40

Here a simple example for the Arduino Nano: Link . There is no DMA, but here the author has decided to allocate as much as one SPI byte per ws2812 bit. This may come in handy when more precise timings... [Read more]

p.kaczmarek2 14 Mar 2025 08:01

Interesting example, although on Atmeda rather this problem with the cache and bit-banging method does not occur? Additionally, it sends each byte separately. I'm still tempted to run these code executions... [Read more]

rafalekrav40 15 Mar 2025 14:53

A nice coincidence as I am currently sitting on a clock on led ws2812:) . I can make the project available if anyone would like. I cut the aperture from milky plexiglass obtained from an old TV, the 7th... [Read more]

p.kaczmarek2 15 Mar 2025 14:58

A familiar sight, I've been through this too, cool project. Self-built 7-segment colour display based on WS2812B . What are you going to drive it with? If you do, consider posting the whole thing... [Read more]

pch 17 Mar 2025 17:36

I too have started to build a clock but still lack the time. . PC [Read more]

FAQ

TL;DR: At 3 MHz, one SPI bit lasts about 0.333 µs, and as the author puts it, "the animations work." This FAQ shows embedded developers how to drive WS2812 LEDs from SPI MOSI + DMA on BK7231, avoiding blocking bit-banging and making long strips and animations practical. [#21478156]

Why it matters: If WS2812 updates no longer block the CPU, the microcontroller can keep running application code while LED data streams out in the background.

Method Timing generation CPU load Best use case
Bit-bang GPIO Manual high/low delays with NOPs High; code blocks during transfer Short strips, simple effects
SPI on MOSI Encodes WS2812 pulses as SPI bit patterns Lower, but still CPU-driven Simpler hardware offload
SPI + DMA SPI sends translated buffer from memory automatically Lowest during transfer Long strips and animations
1 SPI byte per LED bit at 8 MHz Finer timing granularity Higher RAM overhead When tighter timing is needed

Key insight: The core trick is not to "speak SPI" to a WS2812. You use MOSI waveform timing as a signal generator, then map each WS2812 bit to a fixed SPI pattern such as 1000 or 1110.

Quick Facts

  • BK7231 uses one hardware SPI port, and the thread’s DMA example sends a memory buffer at 3,000,000 Hz without blocking the rest of the code path. [#21478156]
  • In the BK7231 encoding scheme, one WS2812 bit expands to 4 SPI bits: logical 0 = 0b1000, logical 1 = 0b1110. [#21478156]
  • With a 3 MHz SPI clock, one SPI bit is about 0.333 µs; 4 bits ≈ 1.332 µs, which is closer to the WS2812 target than 3 bits ≈ 0.999 µs. [#21478156]
  • Buffer sizing matters: the author states that one byte for the WS2812 is 4 bytes on SPI, so DMA buffers grow by before any padding. [#21478156]
  • An Arduino Nano example in the discussion uses 8 MHz SPI and allocates one SPI byte per WS2812 bit for tighter timing control, but without DMA. [#21478887]

How do you control WS2812 LEDs over SPI using the MOSI pin and DMA instead of bit-banging?

You translate each WS2812 data bit into a timed SPI bit pattern on MOSI, then let DMA stream that buffer through the SPI peripheral. 1. Expand LED color bytes into SPI bytes. 2. Send the translated buffer at 3 MHz. 3. Let DMA transmit it in the background while the CPU keeps running. The thread’s BK7231 method uses MOSI only for waveform generation; WS2812 does not actually understand the SPI protocol itself. [#21478156]

Why does bit-banging WS2812 on the BK7231 show unexpected or seemingly non-deterministic delays when the NOP count changes?

Because tight GPIO timing loops can be disturbed by instruction fetch behavior from flash and cache effects. The author observed that changing only the number of nop instructions altered just part of an 8-pulse sequence, which looked non-deterministic on the scope. He suspected cache and flash instruction reads, not the WS2812 logic itself. That makes BK7231 bit-banging less trustworthy for sub-microsecond timing than hardware-timed SPI output. [#21478156]

What is DMA in the context of SPI LED control, and how does it help when driving WS2812 strips?

"DMA" is a hardware data-transfer engine that moves bytes from memory to a peripheral without constant CPU involvement, giving repeatable timing during transmission. In this thread, DMA feeds the SPI hardware directly from a RAM buffer, so LED data continues to transmit while other code runs. That removes the need to hold exact GPIO timing in software and makes long animations practical on BK7231. [#21478156]

What does bit-banging mean for WS2812 control, and why can it block the microcontroller?

Bit-banging means the firmware manually drives a GPIO pin high and low with carefully timed delays for every WS2812 bit. That blocks the microcontroller because code must stay inside the timing loop until the whole transfer finishes. The author states that interrupts must be disabled during the transfer, or the LED timings will be corrupted. On long LED bars, that blocking time becomes significant for animations and multitasking. [#21478156]

How do you translate one WS2812 data byte into SPI bytes like 0b1000 and 0b1110 for timing generation?

You split one WS2812 byte into four 2-bit groups and convert each pair with a lookup table. The thread uses data_translate[4] = {0b10001000, 0b10001110, 0b11101000, 0b11101110}. Each lookup output packs two WS2812 bits into one SPI byte, so one original LED byte becomes 4 SPI bytes total. The example function shifts by 6, 4, 2, and 0 bits to process all four pairs. [#21478156]

Why was a 3 MHz SPI clock chosen for WS2812 timing emulation on the BK7231, and how close is it to the required 1.25 µs bit time?

A 3 MHz SPI clock makes one SPI bit about 0.333 µs, which is a convenient timing unit for WS2812 pulse shaping. The author first considered 3 bits = 0.999 µs, but that was too short versus the 1.25 µs target. He therefore stretched each WS2812 bit to 4 SPI bits, giving about 1.332 µs total. That error is smaller and worked in practice on his setup. [#21478156]

SPI with DMA vs classic bit-bang for WS2812 on BK7231 — which approach is better for long LED strips and animations?

SPI with DMA is better for long strips and animations because it keeps timing stable and avoids blocking the CPU during transfer. Classic bit-bang on BK7231 needs precise delays, can require disabled interrupts, and may suffer from flash or cache timing issues. The author explicitly chose SPI after running into unstable pulse timing with NOP-based code. For animation playback, hardware-timed transfer is the stronger option. [#21478156]

What timing pattern should represent WS2812 bit 0 and bit 1 when encoding them through SPI MOSI?

The final BK7231 mapping is WS2812 0 = 0b1000 and WS2812 1 = 0b1110 on SPI MOSI. This encoding creates a shorter high time for logical 0 and a longer high time for logical 1. Earlier in the design, the author also described a 3-bit idea, where 0 = 011 and 1 = 110, but he replaced it with the 4-bit version because it better matched WS2812 timing. [#21478156]

How do you handle the WS2812 reset/latch condition when the LEDs are being driven through SPI DMA?

You handle reset by holding the line low, which the author says is easy with the same SPI-based logic. He notes that sending the low state alone should no longer be a problem once data is encoded into SPI output. In practice, the project also added extra zero bytes at the beginning and end of the transfer to improve stability. That padding helps the strip see a cleaner idle-low region. [#21478156]

What problems can cache or flash execution cause in tight WS2812 timing loops, and how could running code from RAM help?

Cache and flash execution can insert timing variation into very short GPIO loops, even when the source code repeats the same pulse macro. The author suspected exactly that after seeing inconsistent pulse lengths on BK7231. Running the hot path from RAM as a ramfunc can help because it reduces dependence on flash fetch behavior. He considered that route, but switched to SPI because RAM execution brought setup difficulties on this platform. [#21478156]

Which approach gives more precise WS2812 timings on Arduino Nano or ATmega: one SPI byte per LED bit at 8 MHz, or packed 2-bit translation like on BK7231?

The thread suggests that one SPI byte per WS2812 bit at 8 MHz gives finer timing control than the more packed BK7231-style translation. A reply points to an Arduino Nano example that uses 8 MHz SPI and allocates one full SPI byte per LED bit specifically when more precise timings are required. The trade-off is efficiency: finer granularity costs more buffer space and still lacks DMA in that example. [#21478887]

How much RAM do you need when one WS2812 byte expands to four SPI bytes for DMA transmission?

You need about the original LED data size, before any extra padding bytes. The author states this directly: one byte for the WS2812 becomes 4 bytes on SPI. So if your color buffer is N bytes, the translated DMA buffer is roughly 4N bytes, plus any leading or trailing zero bytes added for stability. That RAM overhead is the main cost of the method. [#21478156]

What is a ramfunc, and when is it useful for time-critical GPIO code such as WS2812 drivers?

"ramfunc" is code placed in RAM instead of flash so the CPU executes it from a faster, more predictable memory region, reducing timing disturbances in critical loops. It is useful when GPIO edges must stay stable at sub-microsecond scales, as in WS2812 bit-banging. In the thread, the author says running from RAM would likely help on BK7231, but memory-layout and configuration issues made SPI the easier solution. [#21479009]

How can you improve transfer stability for WS2812 over SPI DMA, including the use of extra zero bytes at the beginning and end?

You improve stability by padding the translated SPI stream with extra zero bytes before and after the LED payload. The author says he had to add such zeros to increase transfer stability on the used platform. That gives the line a cleaner low level before data starts and after it ends. It is a practical edge-case fix when the raw waveform works, but strip behavior remains inconsistent. [#21478156]

What microcontroller is a good choice to drive a WS2812-based 7-segment clock or status LED project, and what trade-offs matter most?

BK7231 is a good choice when you want background LED updates, while ATmega-class boards suit simpler projects and tighter manual control. The thread discusses BK7231 for DMA-driven strips and mentions an ATmega8 clock project plus an Arduino Nano 8 MHz SPI example. For a single status LED on any pin, flexible GPIO control matters most. For long animated displays, stable timing and CPU availability matter more than minimal RAM use. [#21479009]
Generated by the language model.
%}