Hello. Does anyone know, if at all possible, whether it’s possible to build a voice assistant (e.g. Buddy or ChatGPT, but ideally one that works offline) on an ESP32 with an INMP441 microphone, an OLED display, a speaker and a GF1002 without a DAC (e.g. MAX98357) to create an AI voice assistant (e.g. Buddy or ChatGPT, but ideally one that works offline)?
How are you going to combine such limited hardware with the requirements of AI algorithms that are meant to run locally???
Forget about running ChatGPT, Buddy or anything like that directly on the ESP32. The ESP32 has a few hundred kilobytes of RAM. You need several gigabytes of RAM and a much more powerful processor.
As for audio, the ESP32 has a built-in 8-bit DAC. Apart from producing a crackling sound, it’s useless. You want to connect a PAM8403 with a built-in audio module, e.g. the GF1002, without an external DAC… The synthetic speech will be unintelligible…. The S3, on the other hand, has no DAC at all. It has no analogue DAC outputs
Your microphone is excellent. It’s digital I2S, so the speaker should also operate on the I2S bus. Mixing a digital I2S input with an analogue, modulated output without a dedicated DAC is like mixing an elbow with…
Or you could use the ESP32 solely as a terminal that sends the audio stream from the microphone to a free project, e.g. your own Python server (via Wi-Fi)
The server then sends this to an API, which generates a voice response and sends the finished audio back to the ESP32. All you need to do is buy a MAX98357A.
Forget about running ChatGPT Buddy or anything like that directly on the ESP32. The ESP32 has a few hundred kilobytes of RAM. You’ll need several gigabytes of RAM and a much more powerful processor.
Ah, actually, I’ve checked, and you need an ESP32S3 for that (https://www.youtube.com/watch?v=aDaSp6zaqWM)
JuniorS wrote:
As for audio, the ESP32 has a built-in 8-bit DAC. Apart from some crackling, it doesn’t do anything. You want to connect a PAM8403 with a built-in audio module, e.g. the GF1002, without an external DAC... The synthesised speech will be unintelligible... The S3, on the other hand, has no DAC at all. It doesn’t have any analogue DAC outputs
I was actually expecting that, but it’s always worth asking.
In that case, I have another question. Is it possible to implement voice commands on the ESP32 using this website: https://studio.edgeimpulse.com?