Bring GPT-4o-Like Multimodal AI to Microcontrollers.
Vision + sound, fully offline — private, battery-friendly, OEM-ready.
VWW (Visual Wake Word) — IP block
- Detects human or object presence locally on-device.
- Ideal for event-driven activation of cameras and devices.
SoundWW (Sound Wake Word) — IP block
- Offline always-on detection of keywords or sound events.
- Enables low-power audio triggers.
Fusion logic — custom integration
- Combine VWW + SoundWW with AND/OR/priority logic.
- Reduce false alerts, improve safety, save energy.
Customization Options
- Add new visual and audio classes.
- Configure fusion logic (vision, sound, multimodal combinations).
- Adjust thresholds for accuracy vs. power trade-offs.
- Train custom datasets for OEM-specific domains.
Platforms We Support
- ESP32 (incl. ESP32-S3) → fast prototyping.
- Himax HX6538 (Arm® Cortex™-M55 + Arm® Ethos™-U55) → battery-powered reference module.
- Arm® Cortex™ + Arm® Ethos™-U55 / other AI SoCs → production deployment.
- Migration path: prototype on ESP32 → scale on Arm® Ethos™-U55.
Applications
Smart cameras & doorbells
record and notify only when both vision and sound triggers agree.
Robots (AMRs, service, consumer)
act on commands only when operator presence is confirmed visually.
Access systems
verify presence with VWW and validate offline credentials (VoicePIN or ALPR).
OEM Benefits
- Lower OPEX — fewer false alerts, fewer support calls, less cloud usage.
- Minimal BOM impact — add multimodal features with just a microphone and IP.
- Pro SKU differentiation — “Pro” cameras, robots, and access systems with premium features.
- Energy savings — event-driven logic reduces power drain and extends battery life.
- Privacy & compliance — all processing offline, easier legal approvals.
DIY

When Your Camera Thinks Before It Shoots (Hackster)
event-driven offline camera demo.

Gate — Offline License Plate Recognition Gate Opener (Hackster)
multimodal security + vision.

Battery-Powered Edge AI Module (Himax HX6538)
hardware platform for multimodal PoCs.
For OEMs, these demos show how multimodal triggers reduce false alerts and enable premium offline devices.
How We Work
Evaluate
Pilot (PoC)
License & Integrate
- Evaluate — define target sounds, access scenarios, hardware.
- PoC — 2–3 weeks, fixed scope, measurable results.
- License & Integrate — adapt IP for production; PoC fee credited.
Deliverables
- VWW and SoundWW IP binaries.
- Fusion logic tailored to your use case.
- APIs and integration guides.
- Metrics pack (latency, accuracy, false alarms, energy).
- Demo video on your device.
Licensing
- PoC license — quick validation, low risk.
- Project license — tuned models and integration support.
- Volume license — mass deployment.
No lock-in: portable IP, stable APIs, integration code stays with OEM.
Ready to make your devices
both see and hear?
Launch multimodal PoCs in weeks and unlock Pro features offline