ProductAudioMultimodal

Introducing Waveform Analysis for Audio Scanning

20 Apr 20264 min readBordair

Audio is one of the fastest-growing attack surfaces in LLM applications. Voice assistants, meeting summarisers, voice-note transcribers, and audio-in agentic workflows all feed user-controlled audio straight into a model. Attackers have noticed.

Today we are rolling out waveform analysis as a first-class stage of Bordair's audio scanning pipeline, and it is live for every customer on every plan that includes audio.

What changes for you

Nothing in your integration. The same /scan/audio and /scan/multi endpoints you are already using now benefit from an additional detection layer, running transparently on every audio scan at no extra cost and no extra latency.

You will see it surfaced in two places:

  • The method field in scan responses - look for values beginning with audio+waveform_ when a waveform-level anomaly triggers the verdict.
  • The dashboard method breakdown, where waveform-attributed detections appear alongside ultrasonic, spectral, and transcription-based ones.

Why we added it

Until now, Bordair's audio pipeline defended against three main threat classes: ultrasonic carrier attacks (DolphinAttack and descendants), spectral anomalies typical of adversarial perturbations, and spoken prompt injection surfaced through Whisper transcription. That covers the frequency domain and the linguistic layer well.

Waveform analysis fills the gap in between. It looks at the raw amplitude sequence of an audio file - the shape of the sound over time - and checks for signatures of tampering that do not always show up cleanly in a spectrogram: overdriven attack carriers, sustained noise layers designed to mask adversarial perturbations, impulse injections, concatenation seams from stitched-together clips, and pathological encoding.

These are the kinds of artefacts that survive format conversion, re-encoding, and lossy compression - exactly the situations where purely frequency-based detection can get noisier.

Cross-modal protection comes along for free

Bordair's multimodal endpoint routes each modality through its own pipeline and combines the verdicts. Because waveform analysis runs inside the audio pipeline, it automatically strengthens every cross-modal combination that includes audio: text+audio, document+audio, image+audio, and full four-modality scans. You do not need to opt in, change anything, or even re-read the docs.

Latency and pricing

Waveform analysis runs in the low single-digit milliseconds on a typical 5-to-30-second clip. It short-circuits more expensive downstream work (transcription in particular) when a clear attack signature is found, so for obvious attacks your overall scan latency actually goes down.

Pricing is unchanged. Audio scans remain 15 credits, and waveform analysis is included.

What we are working on next

  • Per-stage confidence surfacing in the dashboard so you can see which layer flagged a given scan.
  • Extending waveform analysis to live streaming audio for real-time voice agents.
  • More signatures - we track new audio-side attack research and add detection for any technique that proves out in our evaluation harness.

Questions? Reply to any Bordair email or reach out via the dashboard. We want to hear what you are scanning.

Protect your LLM application

Add prompt injection detection in minutes with Bordair's API.

Get started free