Hi,

I have a working project (well almost) running on an ESP32-S3 with 8MB PSRAM and 320kb internal RAM.

On Core 0 I’m doing WiFi, HTTP client, OTA, LCD Display management, Microphone, Websockets, ESP-SR (wake word detection), basically all management.

On Core 1 I have two tasks, one to fill up a buffer for audio output and one to actually play the audio.

I figured out that in order to be able to play the filled audio without lagging/interruptions I need to process the audio and the received stream buffer in internal RAM (not PSRAM).

My resources on ESP32-S3 are not enough. I can’t move most of the stuff to PSRAM because it needs the internal RAM. Not enough Heap.

So everything works, even the audio playback but with lagging. The PSRAM is too slow for such operations.

In this situation, would you upgrade to the ESP32-P4-WIFI by Waveshare or do you see another option?

EDIT: I know that I could write the full stream to PSRAM and then start playing after it finishes but that wouldn’t be the real deal. I want responsiveness.

  • just_another_person
    link
    fedilink
    arrow-up
    3
    ·
    18 hours ago

    It’s to say without more specifics, but you’d only be getting up to 64MB max on the P4.

    What audio are you streaming to the device exactly? Are you using any offloading? What’s the memory until on your current project?

    I’m not sure what kind of lagging you’re talking about, but these devices are meant to be anything but responsive or real-time. Thinking of some audio applications I’ve run on a few, and they all have that 250ms sort of lag when dealing with audio operations. I’ve never dug deeper into myself.

    • q1p_@lemmy.zipOP
      link
      fedilink
      arrow-up
      1
      ·
      14 hours ago

      EDIT: I fixed the issue I had with dynamic buffer allocation (I’m dynamically calculating the buffer length based on stream metrics). But will order the P4 anyway, because I want to get my hands on it.

    • q1p_@lemmy.zipOP
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      15 hours ago

      At max I just need a few MB for reading a websocket response into internal RAM and to feed the audio loop that is running on Core 1. The issue is that the Network delivers at around 25 KB/s but the audio playback consumes 48 KB/s (buffer underrun). I can’t lower the sample rate (I tried). I’d change to another codec like Opus but Deepgram API does only support PCM (linear) at 24 KHz. I tried setting other output formats but it’s not working. Technically I could decode Opus.

      The flow is this: TTS -> Websocket -> PSRAM (slow) -> I2S (DMA 8x1024) -> DAC -> Speaker

      DRAM free about 50 kb, PSRAM plenty (6-7 MB)

      • just_another_person
        link
        fedilink
        arrow-up
        2
        ·
        13 hours ago

        You probably only want to use the websocket as a control point, and have another socket open to receive an audio as passthrough? Pretty sure that’s how Sonos et al do it. More lightweight, and you don’t have to worry about overruns like you’re dealing with now perhaps.