SaySynth: A Brief History of Speaking Machines

These are expanded notes from a talk I gave at composition.codes on December 21, 2025. Slides here. Video here.

SaySynth is a synthesizer I built on top of macOS’s text-to-speech framework — more popularly known as the say command. But to explain why I built it and why I think it matters, I want to take a detour through the history of speaking machines more broadly.

A Typology of Speaking Machines

There are roughly four kinds of speaking machines that have existed over time:

Mechanical — Literally physical: bellows forcing air through a reed, with different knobs, valves, and whistles shaping different formants and phonemes. The human operator is part of the instrument.

Formant/Rule-Based — More like a synthesizer: an oscillator and a comb filter simulating the resonant shape of the vocal tract. The system models the acoustics of speech without recording any actual speech.

Sample-Based (Concatenative) — From something as crude as a toy with a phonograph inside, all the way to sophisticated “diphone” synthesizers that splice together recordings of every possible phoneme transition. GPS voices and automated customer service phone lines of the ’90s and 2000s were built this way.

Generative (Neural/AI) — What most people think of today. These are basically sample-based systems taken to an extreme: instead of recordings of phoneme pairs, you’re dealing with individual digital samples predicted by a neural network, sample by sample.

A Brief History

Von Kempelen’s Speaking Machine (1773)

The first speaking machine most people point to. An operator pushes air through a reed and moves their hand around a piece of leather to simulate the shape of the vocal tract, while separate whistles handle noisier consonants like S and T. Crude, but the basic architecture — oscillator source, shaped by something simulating a vocal tract — is essentially what we still see in formant synthesizers today.

Joseph Faber’s Euphonia (1845)

Faber iterated on von Kempelen’s design into something far more sophisticated: sixteen keys, each generating a different phoneme. You can start to see the importance of the operator in these systems. To make it seem less threatening, Faber put a woman’s face on the front of it and, reportedly, sometimes hung a dress in front of the machinery. I suspect this had the opposite of its intended effect.

Edison Talking Dolls (1890s)

Not quite a speaking machine in the traditional sense, but the first concatenative one: a doll with a miniature phonograph inside playing back recordings of children’s rhymes. Edison thought embedding recorded voices in a toy would help people get comfortable with the technology. The preserved recordings suggest he was mistaken.

VODER (1939)

Demonstrated at the 1939 World’s Fair, the VODER was genuinely remarkable for its time — a monophonic synthesizer with an oscillator, a noise generator, and a set of controls for shaping phonemes in real time, with pitch controlled by a foot pedal. What I find most interesting about it is that its “impressiveness” was entirely dependent on its operators, women known as “Voderettes,” who trained for years to produce intelligible speech. The inventor got all the credit. The operators are largely nameless to history.

MUSA — Multichannel Speaking Automaton (1978)

Developed in Italy, MUSA was one of the first practical diphone synthesizers. They even pressed a vinyl record of the results. It uses recordings of every possible phoneme transition (around 2,000 combinations) and then applies DSP to smooth them together. This approach became dominant in commercial TTS through the ’90s and 2000s.

S.A.M. — Software Automatic Mouth (1982)

The first commercially available speech synthesizer, available for the Commodore 64, Atari, and Apple II. What makes SAM notable is that it exposed controls for pitch, speed, and inflection to the user. The company that made it later provided the technology underlying Macintosh’s Macintalk — which is where this story gets personal.

Two Recurring Patterns

Before moving on, it’s worth noting two things that recur throughout this history.

Speaking machines are often demonstrated through singing. From HAL 9000 singing “Daisy Bell” in 2001: A Space Odyssey to Siri, singing has always been the ultimate proof-of-concept for TTS, because it forces the system to handle pitch variation, rhythm, and expressiveness. But there’s an implicit claim embedded in this: that singing is the pinnacle of human linguistic expression, and that a speaking machine isn’t truly “human” unless it can sing.

Speaking machines encode the biases of the culture that produces them. Faber put a female face on his Euphonia to make it seem less threatening. The Voderettes trained for years and are now forgotten. Most AI assistants today are female-coded by default. This isn’t incidental — it reflects a consistent, uncomfortable pattern in how we try to make machines seem approachable by feminizing them, while making the actual human labor behind them invisible.

Macintalk and the say Command

In 1984, Apple shipped Macintalk, a formant-based TTS system. At its launch, Steve Jobs had the Mac introduce itself — a demo that was received with the kind of collective rapture that, in retrospect, feels a little embarrassing.

If you had an Apple computer in the ’90s, you probably remember playing with voices like Bad News, Cellos, Bubbles, Whisper, or Princess. In 2001, with Mac OS X (Cheetah), Apple added a command-line interface to this capability:

say -v Fred "I sure like being inside this fancy computer"

What most people don’t know is that say (and the underlying speech framework) had a hidden, low-level DSL for controlling prosody at the phoneme level. Here’s what it looks like:

[[inpt TUNE]]
~
AA {D 120; P 176.9:0 171.4:22 161.7:61}
r {D 60; P 166.7:0}
~
y {D 210; P 161.0:0}
UW {D 70; P 178.5:0}
_
S {D 290; P 173.3:0 178.2:8 184.9:19 222.9:81}
...
[[inpt TEXT]]

Each phoneme can be assigned a duration (D, in milliseconds) and a pitch curve (P, as frequency-at-position pairs). That chunk above is roughly “are you brushing your teeth?” decomposed into its constituent sounds and then recomposed with explicit timing and pitch. You can get surprisingly expressive with it — not natural-sounding, but expressive in a different way.

I couldn’t find many examples of other people using this syntax. It was documented on an archived Apple developer site and is now deprecated, removed from current macOS. (Which is why I needed to bring an old Mac mini to the demo.)

SaySynth

The insight behind SaySynth is simple: if you can specify pitch per-phoneme in the say DSL, you can use it as a synthesizer. Instead of trying to produce legible speech, you push the tool in a direction it was never designed for.

Rather than writing raw DSL by hand, I built a YAML-based sequencer on top of it. Here’s an excerpt from a piece called “fire”:

name: fire
globals:
    start_bpm: 65
    rate: 160
    stereo: true
tracks:
    water:
        type: chord
        options:
            root: F#2
            text: wawer
            voice: Victoria
            chord_notes: [-12, -5, 0, 4, 9, 14]
            segment_count: 1/32
            randomize_segments: octaves,velocity
            volume_range: [0.01, 0.19]
    fire:
        type: chord
        options:
            root: F#0
            chord_notes: [0, 12]
            text: fire!
            segment_count: 1/6
            randomize_segments: octaves,velocity
            volume_range: [0.05, 0.4]

Each “chord” is produced by spawning multiple parallel say subprocesses, one per note. Because there’s no way to synchronize them precisely, they slowly drift in and out of phase. The system failing to do the thing it’s supposed to do is what makes it sound interesting — more organic, more human-like than it has any right to be.

I’ve also been working on support for alternative tunings via Ableton’s Scala (.ascl) format, which makes it possible to play in, say, Wendy Carlos’s tuning from Beauty in the Beast rather than standard 12-tone equal temperament.

Why Does This Matter?

Whatever you now find weird, ugly, uncomfortable and nasty about a new medium will surely become its signature… It’s the sound of failure: so much modern art is the sound of things going out of control, of a medium pushing to its limits and breaking apart.

— Brian Eno

The version of the future that tech companies sell us is one in which AI improves exponentially until it reaches “humanness” — the singularity.

What this story leaves out is that humanness isn’t a fixed target. Capitalism slowly dehumanizes people, narrows what we do and how we’re valued, until it becomes easier for AI to approximate what we’ve become. If you’re training machine learning models in a warehouse or running the same script in a call center, you’re already functioning like a machine in the relevant sense.

The history of speaking machines is, in part, a history of compressing the expressive range of human voice until it becomes usable — legible, predictable, efficient. Each generation of TTS gets more natural-sounding and less weird. The say command’s low-level phoneme DSL, which let you do genuinely strange things with pitch and timing, is now deprecated. SSML (the standardized modern alternative) lets you specify relative pitch but not actual frequencies. As TTS has gotten better at sounding human, it’s gotten less interesting as a creative tool.

I think there’s real value in working with tools that are supposed to do one thing and fail, tools that preserve the texture of their own limitation. Not for nostalgia’s sake, but because that texture is the thing — because art’s job right now might be to make strange what capitalism is trying to make invisible and ordinary.


SaySynth is on GitLab. Music made with it is on Bandcamp.


Roasted Lemon Chicken Thighs and Potatoes

This is very much a delicious version of boomer mom dinner. (Stolen from cafehaillee.com).

Ingredients:

  • ½ cup fresh lemon juice, from about 4 lemons
  • ½ cup olive oil
  • 4 tsp 12g diamond crystal kosher salt *
  • 6 cloves garlic, grated
  • 1 Tbsp oregano
  • ½ tsp black pepper
  • 2 pounds yukon gold potatoes, scrubbed clean and cut into large wedges
  • 6 bone in skin on chicken thighs, (about 2 lbs)
  • ÂĽ cup chicken stock
  • 2 Tbsp chopped fresh parsley
  • 2 Tbsp chopped fresh dill

Instructions:

  • Preheat oven to 425ÂşF.
  • In a small bowl combine lemon juice, olive oil, salt, garlic, oregano and black pepper.
  • Add potatoes to a 9Ă—13” baking dish and pour half of the lemon olive oil mixture over. Toss to coat.
  • Add chicken thighs on top of potatoes and pour the remainder of the mixture over the chicken. Rub mixture all over.
  • Pour chicken stock around the chicken and over the potatoes.
  • Bake until chicken is deep golden and cooked through, about 50-60 minutes.
  • Remove chicken thighs from pan and set aside to rest. Turn broiler on to high and broil potatoes until golden brown, about 6-8 minutes.
  • Top potatoes with parsley and dill.
  • Serve chicken with potatoes and pan juices. The pan juices are so good in this don’t skip them!

Notes:

  • If you don’t have diamond crystal kosher salt, use the weight measurement for the salt, as 4 tsp of Morton kosher salt will yield a much different (too salty) result.

fuzzyy @ light and sound design, part 1

dj / music / mix
fuzzyy @ light and sound design, part 1

Light and Sound Design is a treasure. One of the few remaining DIY spaces in the city, it features a vintage sound system sourced in part from Magick City (RIP) and custom light installations built by the volunteers that run the space.

I’ve been to many shows at L&SD, but only recently got the honor of DJ’ing there in support of Will Shore’s group, pedestrian, a 10-piece improvisational group which he led via conduction, an interactive method of conducting developed by Butch Morris in which the players have an equal role in guiding the direction of their playing.

My mix before their set aimed to set a mellow vibe centered on percussion-forward tracks around 80bpm. Part two here.


Month in review: June 2025

Yuba Verde Sandwich From Superiority Burger

I love the yuba verde sandwich from superiority burger. It’s the closest vegetarian approximation of deli meat I’ve had and also references a Philly Italian Pork Sandwich which is delicious and filled with broccoli rabe. I adapted this from my memory and also the intro paragraph to the Bittman Project’s knock-off, but I haven’t read that full recipe because it’s paywalled. My version adds cheese instead of a bean spread as well as mayo, so it’s def not vegan, but arguably more delicious. I’ve also used kale instead of broccoli rabe which is easier to eat in a sandwich, though I did miss its bitterness.

Ingredients

For the vinaigrette / yuba marinade

  • 1.5 cups olive oil
  • 1/3 cup red wine vinegar
  • 1/2 lemon juiced
  • 1 tbsp capers
  • 1/6-1/4 of a preserved lemon
  • 4 cloves garlic
  • 2-3 tsp dijon mustard
  • 1 cup packed parsley
  • salt and pepper to taste

For the broccoli rabe:

  • 1 bunch brocolli rabe
  • 1-2 tbsp olive oil
  • salt and pepper

For the kale (if using instead of brocolli rabe):

  • 1 bunch kale
  • 1-2 tbsp olive oil
  • 3 cloves garlic
  • 1/2 lemon
  • salt and pepper

For the yuba:

For assembling the sandwich:

  • kewpie mayo
  • cento hoagie spread
  • 1.5 cups grated sharp cheddar cheese.
  • 2 heros
  • other optional goodies (sliced tomatoes, pickled carrots, etc.)

Instructions

Make the vinaigrette and marinade the yuba:

  • Combine vinaigrette ingredients in a blender and pulse until its creamy and a nice light green color. Add salt and pepper, blend, and taste until it’s right.
  • Pour a bit of vinaigrette in the bottom of a medium-size tupperware container. Separate the sheets of yuba and layer one sheet at a time pouring vinaigrette between each sheet like you were assembling a lasagna. Reserve the leftover vinaigrette.
  • Let the yuba marinade in the fridge for at least an hour.

Make the broccoli rabe:

  • pre-heat the oven to 400-425
  • toss the broccoli rabe with olive oil and salt and pepper
  • bake for ~ 20 minutes. you might want to take the leaves out sooner or else they’ll burn.
  • before assembling the sandwiches you might want to roughly chop the brocolli rabe so it’s easier to eat.
  • NOTE: you can save some time by baking the brocolli rabe with the yuba

Make the kale (if using instead of broccoli rabe):

  • cut out the kale stalks
  • roll the kale up and cut it into thin strips
  • dice the garlic and set aside
  • heat the oil (on high) in a pan until it’s smoking
  • add the kale and stir for about a minute until it changes color
  • add the garlic and stir for 15-20 seconds
  • squeeze the lemon juice over the kale and stir to incorporate
  • turn off the heat and add salt and pepper to taste
  • remove from the pan and set aside

Make the yuba:

  • pre-heat the oven to 400-425
  • lightly grease a baking sheet (you might need two) and lay the yuba out so no sheets are overlapping
  • bake for ~10 minutes until the yuba starts to bubble up off the pan and turn light brown
  • finish under a broiler on high until the yuba has some brown spots, but don’t let it burn! this step helps give the yuba the taste and texture of smoked meat

Toast the bread and melt the cheese:

  • lightly brush olive oil on each half of the heros
  • cover the top half in grated cheese
  • place in the oven for 3-4 minutes
  • NOTE: You can save a bit of time by doing this while the yuba is browning under the broiler, but just put the bread on a lower rack.

Assemble the sandwich

This is only my suggestion. Assembling a sandwich is a personal journey:

  • On the bottom half of the hero, lather mayo and hoagie spread.
  • Stack up the yuba on top.
  • Place the kale/broccoli rabe on top of the melted cheese
  • Optionally add other goodies on top of the yuba (sliced tomatoes, spicy pickled cucumbers, whatever you like!)
  • drizzle some of the vinaigrette marinade on top of the yuba and kale/broccoli rabe.
  • put the two halves together.

Eat!


Halal Guys Chicken Over Rice

My first real job out of college was on 53st and 6th, right across the street from the original halal guys truck and I would eat it all the time for lunch. Even back then there could be a line down the block and you had to show up early to beat the lunch rush.

I made this copycat recipe last night and it was convincingly accurate, maybe even better than the real thing. I was out of cumin so substituted coriander, put less mayo in the sauce than it called for, grilled the chicken instead of sauteed, and used brown rice instead of white basmati in a rice cooker. You could probably make this with tofu or tempeh, too.

Ingredients

For the chicken:

  • 2 lbs chicken breasts 900 g, about 6
  • 1/4 cup olive oil
  • 2 tablespoon lemon juice
  • 1 tablespoon white vinegar
  • 3 garlic cloves, minced
  • 1 teaspoon salt
  • 1/2 teaspoon black pepper
  • 1 teaspoon oregano powder
  • 1 teaspoon all spice or seven spices
  • 1/2 teaspoon ginger powder
  • 1/2 teaspoon cumin powder
  • 1/2 teaspoon paprika powder
  • 1 tablespoon vegetable oil for cooking

For the white sauce:

  • 3/4 cup Greek yogurt 170g (6oz)
  • 1/2 cup mayonnaise 115 g
  • 1 tablespoon white vinegar
  • 1 tablespoon water
  • 1 teaspoon lemon juice
  • 1-2 teaspoon sugar
  • 1/2 teaspoon salt
  • 1/4 teaspoon black pepper

For the rice:

  • 2 tablespoon unsalted butter
  • 1/2 teaspoon turmeric powder
  • 1/4 teaspoon cumin powder
  • 1 1/2 cups basmati rice, soaked for 20 minutes then rinsed until water runs clear
  • 2 1/2 cups chicken stock or 1 stock cube dissolved in 2.5 cups hot water
  • salt and pepper to taste, a pinch of each

To assemble:

  • 1 chopped tomato
  • handful chopped parsley
  • sriracha sauce

Instructions

For the chicken:

  • In a bowl, add all the ingredients for chicken marinade, like the olive oil, lemon juice, vinegar, garlic cloves, all the spices and the salt and pepper. Combine until well mixed, then add the chicken breast and toss to coat evenly. Cover the bowl with plastic wrap and marinade for 1-2 hours. (meanwhile you can make the white sauce).
  • Heat the vegetable oil in a large skillet on medium high heat, then add the chicken. Cook for 5-6 minutes each side, or until golden brown and cooked through.
  • Remove chicken to a cutting board, and let it rest for 10 minutes, before chopping roughly into 2 inch cubes.

For the white sauce:

Mix together all the ingredients for the sauce, and store in the fridge until ready to serve.

For the rice:

  • Add the butter to a large saucepan over medium heat. Once melted, add the turmeric and cumin, and stir for 1 minutes or until spices are fragrant.
  • Add the basmati rice, and toast the rice by stirring around in the pan for 4 minutes. Add the chicken stock, plus a pinch of salt and pepper to season. Stir and bring to a boil, then reduce heat to a simmer, cover and cook for 15 minutes.
  • After rice cooking time, once all the water has been absorbed, remove pan from the heat and set aside for 15 minutes without disturbing. After 15 minutes, fluff gently with a fork.

To assemble:

Place the rice in your serving platter, top with the chicken, the white sauce, and chopped tomatoes and parsley. You can add some pita bread too. Optionally, but recommend, drizzle with sriraracha sauce!