AI that learns to speak, not from data — from trying.
A neural controller learns to produce human vowels by controlling a real-time vocal instrument. No training data. No phoneme models. No text. Just an AI, a voice box, and feedback.
Research in progress. First vowels achieved March 2026.
Human infants don't learn to speak from datasets. They babble, hear the result, and adjust. They discover how their vocal cords work through trial and feedback. No one programs them.
We asked: can an AI do the same?
No text input. No phoneme pipeline. The AI controls a physical vocal instrument directly.
No recorded speech dataset. The AI discovers how to produce sound through its own exploration.
A teacher scores the output (closer/further). It never tells the AI what to do. Scoring is the only control lever.
All learning must be emergent. No programmatic sound shaping. If the AI can't discover it, it doesn't happen.
From random noise to recognisable vowels — a timeline of discovery.
March 2026 — Week 1
An evolutionary search (MAP-Elites) explored a DSP vocal instrument. Given only a speech detector as feedback, 12 independent seeds all converged on speech-like acoustic structure. Score: 0.97.
Random noise evolving toward speech-like structure over 2,500 generations
A 50ms acoustic gesture discovered independently by every seed — a universal speech initiator
Discovered primitives composed into the highest-scoring output
Evolutionary search found speech — but it had no understanding, no memory, no ability to generalise. It stumbled onto speech without knowing what it was doing. Time for a different approach.
March 2026 — Week 2 (current)
A GRU neural controller learns to produce sound by sending motor commands to a real-time Rust vocal instrument. It hears the result through an inner ear and adjusts. Like an infant learning to speak.
Motor parameters per 10ms frame
Distinct vowels learned
Controller brain for all vowels
One neural controller producing three distinct vowels — and transitioning between them.
The same controller receives a different target signal and produces a different vowel. All learned through feedback, not programmed.
"ah"
"ee"
"oo"
The controller smoothly transitions between vowel states — learned, not interpolated.
"ah" to "ee"
"ah" to "oo"
Spectrogram comparison — Richard's voice (top) vs AI vowels. Each vowel has a distinct formant signature.
The AI sends random motor commands to a vocal instrument and hears what comes out
An inner ear extracts compact acoustic features from each 10ms frame of output
The controller learns which motor commands produce which sounds through closed-loop feedback
Given a target sound, the controller figures out how to make the instrument match it
Real-time source-filter synthesis. Glottal pulse generator + cascade all-pole formant filter. 8 motor inputs, 220 audio samples out, every 10ms.
2-layer GRU neural network. Takes target + acoustic feedback, outputs 8 motor commands per frame. One brain controls all vowels.
Compact 16-float feedback vector per frame. F0, voicing, energy, spectral features. Causal — only hears what's already been produced.
Closed-loop behaviour cloning with the instrument in the training loop. No reinforcement learning needed for basic vowel control. Deterministic. Reproducible.
Phase 1 evolutionary search ran on AWS EC2 — parameter sweeps on 192-core Graviton instances, multi-seed validation campaigns. Phase 2 neural training developed locally on Apple Silicon, with cloud compute reserved for scaling experiments. AWS Activate startup program.
Acoustic Intelligence is an Australian deep-tech AI research startup. We're building AI systems that learn the skill of speaking — not from data, but from physics and feedback.
Founded by Richard James. Based in Brisbane, Australia.
This work demonstrates that an AI can learn to produce distinct human vowels by controlling a physical vocal instrument through feedback alone — no speech data, no linguistic knowledge, no programmatic shaping.
Research collaborations, partnerships, and investment inquiries welcome.
richard@acousticintelligence.ai