How to build a synthesizer with python: part 5

FX & Going Further

mike | July 22, 2023, 7:35 p.m.

In the previous posts we built a synthesizer one piece at a time. In this tutorial we'll build two additional FX components. First we'll build a low-pass filter, then we'll build a delay effect. We'll integrate both into our synthesizer. Finally, we'll discuss some of the shortcomings of our synthesizer and how you can take the project further.

Let's get coding!

Low-pass filter

Low-pass filters are useful for shaping the timbre, or texture, of the sound. They generally impart a smoothing effect on the waveforms they filter, leading to a less harsh sound overall.

We'll use the SciPy module to create the low-pass filter, so we'll need to add it with pip. Go ahead and run:

$ pip install scipy
$ pip freeze > requirements.txt

Next, go ahead and add a file to the signal submodule called low_pass_filter.py.

import logging
from typing import List
from copy import deepcopy

import numpy as np
from scipy.signal import butter, lfilter, lfilter_zi

from .component import Component

class LowPassFilter(Component):
    def __init__(self, sample_rate: int, frames_per_chunk: int, subcomponents: List['Component'] = [], name: str="LowPassFilter", control_tag: str="lpf"):
        super().__init__(sample_rate, frames_per_chunk, subcomponents=subcomponents, name=name, control_tag=control_tag)
        self.log = logging.getLogger(__name__)
        self.filter_order = 2
        self.cutoff_frequency = 20000.0
        self.b, self.a = self.compute_coefficients()
        self.zi = self.compute_initial_conditions()

    def __iter__(self):
        self.source_iter = iter(self.subcomponents[0])
        return self

    def __next__(self):
        input_signal = next(self.source_iter)
        output_signal, self.zi = lfilter(self.b, self.a, input_signal, zi=self.zi)
        return output_signal.astype(np.float32)

    def __deepcopy__(self, memo):
        return LowPassFilter(self.sample_rate, self.frames_per_chunk, [deepcopy(self.subcomponents[0], memo)], name=self.name, control_tag=self.control_tag)

    @property
    def cutoff_frequency(self):
        return self._cutoff_frequency
    
    @cutoff_frequency.setter
    def cutoff_frequency(self, value):
        try:
            float_val = float(value)
            if float_val < 0.0:
                raise ValueError("Cutoff frequency must be positive.")
            self._cutoff_frequency = float_val
            self.b, self.a = self.compute_coefficients()
        except ValueError:
            self.log.error(f"Couldn't set with value {value}")

    def compute_coefficients(self):
        nyquist = 0.5 * self.sample_rate
        normalized_cutoff = self.cutoff_frequency / nyquist
        b, a = butter(self.filter_order, normalized_cutoff, btype='low', analog=False)
        return b, a

    def compute_initial_conditions(self):
        zi = lfilter_zi(self.b, self.a)
        return zi

ChatGPT wrote a lot of this class for me, so I also asked it to explain how the code works:

This class represents a low-pass filter, which allows signals with a frequency lower than a certain cutoff frequency to pass through and attenuates signals with frequencies higher than the cutoff frequency. Let's break down how this class works:

1. Initialization:

sample_rate: The rate at which the audio signal is sampled.

frames_per_chunk: The number of frames in each chunk of the signal being processed.

cutoff_frequency: The frequency above which the signal is attenuated. It is set to 20,000 Hz by default, which is roughly the upper limit of human hearing, so this default value would result in a filter that passes almost all audible frequencies.

filter_order: The order of the filter, which affects its roll-off (how quickly frequencies above the cutoff are attenuated). It's set to 2, which makes this a second-order Butterworth filter.

2. Butterworth Filter: The Butterworth filter is designed to have as flat a frequency response as possible in the passband. The butter function in SciPy is used to compute the coefficients a and b of the filter using the cutoff frequency, the sample rate, and the order of the filter.

3. Processing the Signal:

The __iter__ method initializes an iterator for the input signal, which will be passed through the filter.

The __next__ method takes each chunk of the input signal and applies the filter using the lfilter function from SciPy. This function takes the input signal and the coefficients a and b, along with zi (the initial conditions of the filter), and returns the filtered signal. The zi value is updated at each step, so the filter's state is maintained between chunks.

4. Cutoff Frequency Property: The cutoff_frequency property allows you to get or set the cutoff frequency of the filter. When the cutoff frequency is set, the filter coefficients a and b are recomputed, so the filter's behavior changes accordingly.

5. Initial Conditions: The compute_initial_conditions method computes the initial conditions for the filter using the lfilter_zi function from SciPy, which calculates the initial filter delay values based on the filter coefficients.

In summary, this class implements a second-order Butterworth low-pass filter, which allows signals below a certain cutoff frequency to pass and attenuates signals above that frequency. The filter coefficients are computed based on the cutoff frequency, and the filter state is maintained between chunks of the input signal, so it can be applied to a continuous stream of audio.

Alright, got it. We allow frequencies below the cutoff frequency to pass through the filter, and attenuate higher frequencies. This has the effect of silencing some of the overtones that tend to build up from non-sine waves.

But what does that look like? What does it sound like? Let's look at an unfiltered wave first.

This waveform was obtained by mixing square and sawtooth waves together. As you can see, the transitions at the inflection points are quite abrupt. Waves like this tend to produce a lot of overtones, which means that although our wave in this example has a fundamental frequency of 110 Hz, it also produces sound at higher frequencies. We experience this effect as timbre. It's what makes a square wave sound different from a sine wave, which only produces sound at one frequency. Let's look at the graph of a low-pass filtered version of the same wave.

Notice the more gradual, rounded transitions. In this example, the cutoff frequency of the low-pass filter is 400 Hz. That means the overtones above 400 Hz are greatly diminished or silenced completely. You can see that, even though the base frequency is 110 Hz, our wave was generating a lot of frequency content above 400 Hz. If we keep lowering the cutoff frequency towards the fundamental frequency, our wave will appear more and more like a sine wave as the overtones above it are attenuated.

The example above would sound quite similar to a sine wave, as most of its overtones have been rolled off.

Looking at pictures of sound can provide a useful visual metaphor for how our low-pass filter affects the output, but let's hear an example! In the sound clip below you can hear a low-pass filter sweep. At first there is no filtering, and then the cutoff frequency is adjusted down and up repeatedly.

Can you hear the frequencies roll off as the cutoff frequency is lowered? That is the effect of the low-pass filter.

Delay

Let's build one more effect. Delay is a type of effect where the sound is recorded, then played back a duration of time later. This duration of time is called the delay time. Our delay component will work by allowing audio signal to pass through it while also simultaneously recording the audio. After the delay time elapses, the recorded audio is mixed back in to the real time signal. With smaller delay times, this can add body to a sound, or make it sound more spacey. When you increase the delay time it will start to produce an echo effect.

Add a file called delay.py to the signal submodule.

import logging
from copy import deepcopy

import numpy as np

from .component import Component

class Delay(Component):
    def __init__(self, sample_rate, frames_per_chunk, subcomponents, name="Delay", control_tag="delay") -> None:
        super().__init__(sample_rate, frames_per_chunk, subcomponents=subcomponents, name=name, control_tag=control_tag)
        self.log = logging.getLogger(__name__)
        self.delay_buffer_length = 4.0
        self._delay_time = 0.0
        self.delay_frames = int(self.delay_buffer_length * self.sample_rate)
        self.delay_buffer = np.zeros(self.delay_frames, np.float32)
        self.delay_time_start_index = self.delay_frames - int(self.delay_time * self.sample_rate)
        self.wet_gain = 0.5 

    def __iter__(self):
        self.signal_iter = iter(self.subcomponents[0])
        return self
    
    def __next__(self):
        mix = next(self.signal_iter)
        
        # Add the delayed signal to the mix
        if self.delay_time > 0:
            delayed_signal = self.delay_buffer[self.delay_time_start_index: self.delay_time_start_index + self.frames_per_chunk]
            while len(delayed_signal) < self.frames_per_chunk:
                delayed_signal = np.concatenate((delayed_signal, self.delay_buffer[:self.frames_per_chunk - len(delayed_signal)]))
            
            delayed_signal *= self.wet_gain
            mix += delayed_signal

        # Add the current signal to the delay buffer
        self.delay_buffer = np.roll(self.delay_buffer, -self.frames_per_chunk)
        self.delay_buffer[self.delay_frames - self.frames_per_chunk: self.delay_frames] = mix

        return mix

    
    def __deepcopy__(self, memo):
        return Delay(self.sample_rate, self.frames_per_chunk, subcomponents=[deepcopy(sub, memo) for sub in self.subcomponents], name=self.name, control_tag=self.control_tag)
    
    @property
    def delay_time(self):
        return self._delay_time

    @delay_time.setter
    def delay_time(self, value):
        self._delay_time = float(value)
        self.delay_time_start_index = self.delay_frames - int(self.delay_time * self.sample_rate)

The constructor takes care of some set up for us, like initializing the delay_buffer. You can think of this buffer as our recording. It acts like a loop of tape, with the write head constantly overwriting the oldest sound with the new audio signal.

As usual, the most interesting code is in the __next__ method. First we get the real time output from the components below us in the tree and put it in the mix variable. Next, we check if the delay_time is greater than 0. This is used as a proxy for whether the effect is turned on or off.

If the delay effect is active, then we fetch an audio chunk from the delay_buffer. Once we get a delayed chunk, we apply the wet_gain to it and add it to the mix. The wet_gain controls how loud to mix the delayed signal back in. By setting it to a value less than one, the echo will get quieter every time it repeats.

After the delayed signal is mixed in, we record the latest chunk into the delay_buffer. We use numpy's roll function to shift the array with wrap-around, then we replace the chunk that wrapped around with the latest chunk. This is the tape loop effect we described above.

Let's hear what that sounds like in practice.

Imagine the possibilities for a better keyboardist!

Integration

Okay, now that we've built our signal components, let's integrate them into the synthesizer. Open the implementation.py file and add to the enum:

    LPF_CUTOFF = 71
    DELAY_TIME = 72
    DELAY_WET_GAIN = 73

Make sure to change the numbers to the CC numbers that you want to use from your MIDI controller. You can use the method from part 4 to figure out which CC numbers to use.

Next, open up the synthesizer.py file. Import the new components with the rest of the project imports.

from .synthesis.signal.low_pass_filter import LowPassFilter
from .synthesis.signal.delay import Delay

Now, in the __init__ method we need to add a few lookup arrays. Below the self.osc_mix_vals array add:

        self.lpf_cutoff_vals = np.logspace(4, 14, 128, endpoint=True, base=2, dtype=np.float32) # 2^14=16384 : that is the highest possible cutoff value
        self.delay_times = 0.5 * np.logspace(0, 2, 128, endpoint=True, base=2, dtype=np.float32) - 0.5 # range is from 0 - 1.5s
        logspaced = np.logspace(0, 1, 128, endpoint=True, dtype=np.float32) # range is from 1-10
        self.delay_wet_gain_vals = (logspaced - 1) / (10 - 1) # range is from 0-1

Now find the control_change_handler method and add the following elifs:

        elif cc_number == Implementation.LPF_CUTOFF.value:
            lpf_cutoff = self.lpf_cutoff_vals[val]
            self.set_lpf_cutoff(lpf_cutoff)
            self.log.info(f"LPF Cutoff: {lpf_cutoff}")
        elif cc_number == Implementation.DELAY_TIME.value:
            delay_time = self.delay_times[val]
            self.set_delay_time(delay_time)
            self.log.info(f"Delay Time: {delay_time}s")
        elif cc_number == Implementation.DELAY_WET_GAIN.value:
            delay_wet_gain = self.delay_wet_gain_vals[val]
            self.set_delay_wet_gain(delay_wet_gain)
            self.log.info(f"Delay Wet Gain: {delay_wet_gain}")

In setup_signal_chain, below the mixer, add:

        lpf = LowPassFilter(self.sample_rate, self.frames_per_chunk, [mixer], control_tag="lpf")

        delay = Delay(self.sample_rate, self.frames_per_chunk, [lpf], control_tag="delay")

And change signal_chain = Chain(mixer) to

        signal_chain = Chain(delay)

Just like before, we've referenced a few methods we haven't implemented yet. Let's add them at the bottom of the file.

    def set_lpf_cutoff(self, cutoff):
        for voice in self.voices:
            lpf_components = voice.signal_chain.get_components_by_control_tag("lpf")
            for lpf in lpf_components:
                lpf.cutoff_frequency = cutoff

    def set_delay_time(self, time):
        for voice in self.voices:
            delay_components = voice.signal_chain.get_components_by_control_tag("delay")
            for delay in delay_components:
                delay.delay_time = time

    def set_delay_wet_gain(self, gain):
        for voice in self.voices:
            delay_components = voice.signal_chain.get_components_by_control_tag("delay")
            for delay in delay_components:
                delay.wet_gain = gain

The synthesizer.py file now looks like:

import threading
import logging
from queue import Queue
from copy import deepcopy

import numpy as np

from . import midi
from .midi.implementation import Implementation
from .synthesis.voice import Voice
from .synthesis.signal.chain import Chain
from .synthesis.signal.sine_wave_oscillator import SineWaveOscillator
from .synthesis.signal.square_wave_oscillator import SquareWaveOscillator
from .synthesis.signal.gain import Gain
from .synthesis.signal.mixer import Mixer
from .synthesis.signal.low_pass_filter import LowPassFilter
from .synthesis.signal.delay import Delay
from .playback.stream_player import StreamPlayer

class Synthesizer(threading.Thread):
    def __init__(self, sample_rate: int, frames_per_chunk: int, mailbox: Queue, num_voices: int=4) -> None:
        super().__init__(name="Synthesizer Thread")
        self.log = logging.getLogger(__name__)
        self.sample_rate = sample_rate
        self.frames_per_chunk = frames_per_chunk
        self.mailbox = mailbox
        self.num_voices = num_voices
        self.should_run = True

        # Set up the voices
        signal_chain_prototype = self.setup_signal_chain()
        self.log.info(f"Signal Chain Prototype:\n{str(signal_chain_prototype)}")
        self.voices = [Voice(deepcopy(signal_chain_prototype)) for _ in range(self.num_voices)]

        # Set up the stream player
        self.stream_player = StreamPlayer(self.sample_rate, self.frames_per_chunk, self.generator())

        # Set up the lookup values
        self.osc_mix_vals = np.linspace(0, 1, 128, endpoint=True, dtype=np.float32)
        self.lpf_cutoff_vals = np.logspace(4, 14, 128, endpoint=True, base=2, dtype=np.float32) # 2^14=16384 : that is the highest possible cutoff value
        self.delay_times = 0.5 * np.logspace(0, 2, 128, endpoint=True, base=2, dtype=np.float32) - 0.5 # range is from 0 - 1.5s
        logspaced = np.logspace(0, 1, 128, endpoint=True, dtype=np.float32) # range is from 1-10
        self.delay_wet_gain_vals = (logspaced - 1) / (10 - 1) # range is from 0-1

    def run(self):
        self.stream_player.play()
        while self.should_run and self.stream_player.is_active():
            # get() is a blocking call
            if message := self.mailbox.get(): 
                self.message_handler(message)
        return
    
    def message_handler(self, message: str):
        """Handles messages from the mailbox."""
        match message.split():
            case ["exit"]:
                self.log.info("Got exit command.")
                self.stream_player.stop()
                self.should_run = False
            case ["note_on", "-n", note, "-c", channel]:
                int_note = int(note)
                chan = int(channel)
                note_name = midi.note_names[int_note]
                self.note_on(int_note, chan)
                self.log.info(f"Note on {note_name} ({int_note}), chan {chan}")
            case ["note_off", "-n", note, "-c", channel]:
                int_note = int(note)
                chan = int(channel)
                note_name = midi.note_names[int_note]
                self.note_off(int_note, chan)
                self.log.info(f"Note off {note_name} ({int_note}), chan {chan}")
            case ["control_change", "-c", channel, "-n", cc_num, "-v", control_val]:
                chan = int(channel)
                int_cc_num = int(cc_num)
                int_cc_val = int(control_val)
                self.control_change_handler(chan, int_cc_num, int_cc_val)
            case _:
                self.log.info(f"Matched unknown command: {message}")

    def control_change_handler(self, channel: int, cc_number: int, val: int):
        self.log.info(f"Control Change: channel {channel}, number {cc_number}, value {val}")
        if cc_number == Implementation.OSCILLATOR_MIX.value:
            gain_b_mix_val = self.osc_mix_vals[val]
            gain_a_mix_val = 1 - gain_b_mix_val
            self.set_gain_a(gain_a_mix_val)
            self.set_gain_b(gain_b_mix_val)
            self.log.info(f"Gain A: {gain_a_mix_val}")
            self.log.info(f"Gain B: {gain_b_mix_val}")
        elif cc_number == Implementation.LPF_CUTOFF.value:
            lpf_cutoff = self.lpf_cutoff_vals[val]
            self.set_lpf_cutoff(lpf_cutoff)
            self.log.info(f"LPF Cutoff: {lpf_cutoff}")
        elif cc_number == Implementation.DELAY_TIME.value:
            delay_time = self.delay_times[val]
            self.set_delay_time(delay_time)
            self.log.info(f"Delay Time: {delay_time}s")
        elif cc_number == Implementation.DELAY_WET_GAIN.value:
            delay_wet_gain = self.delay_wet_gain_vals[val]
            self.set_delay_wet_gain(delay_wet_gain)
            self.log.info(f"Delay Wet Gain: {delay_wet_gain}")
        

    def setup_signal_chain(self) -> Chain:
        """Build the signal chain prototype."""
        osc_a = SineWaveOscillator(self.sample_rate, self.frames_per_chunk)
        osc_b = SquareWaveOscillator(self.sample_rate, self.frames_per_chunk)

        gain_a = Gain(self.sample_rate, self.frames_per_chunk, [osc_a], control_tag="gain_a")
        gain_b = Gain(self.sample_rate, self.frames_per_chunk, [osc_b], control_tag="gain_b")

        mixer = Mixer(self.sample_rate, self.frames_per_chunk, [gain_a, gain_b])

        lpf = LowPassFilter(self.sample_rate, self.frames_per_chunk, [mixer], control_tag="lpf")

        delay = Delay(self.sample_rate, self.frames_per_chunk, [lpf], control_tag="delay")

        signal_chain = Chain(delay)
        return signal_chain
    
    def generator(self):
        """
        Generate the signal by mixing the voice outputs
        """
        mix = np.zeros(self.frames_per_chunk, np.float32)
        num_active_voices = 0
        while True:
            for i in range(self.num_voices):
                voice = self.voices[i]
                mix += next(voice.signal_chain)
                if voice.active:
                    num_active_voices += 1
            
            # Prevent the mix from going outside the range (-1, 1)
            mix = np.clip(mix, -1.0, 1.0)
            
            yield mix
            mix = np.zeros(self.frames_per_chunk, np.float32)
            num_active_voices = 0

    def note_on(self, note: int, chan: int):
        """
        Set a voice on with the given note.
        If there are no unused voices, drop the voice that has been on for the longest and use that voice
        """
        note_id = self.get_note_id(note, chan)
        freq = midi.frequencies[note]
        for i in range(len(self.voices)):
            voice = self.voices[i]
            if not voice.active:
                voice.note_on(freq, note_id)
                self.voices.append(self.voices.pop(i)) # Move this voice to the back of the list. It should be popped last
                break

            if i == len(self.voices) - 1:
                self.log.debug(f"Had no unused voices!")
                self.voices[0].note_off()
                self.voices[0].note_on(freq, note_id)
                self.voices.append(self.voices.pop(0))


    def note_off(self, note: int, chan: int):
        """
        Find the voice playing the given note and turn it off.
        """
        note_id = self.get_note_id(note, chan)
        for i in range(len(self.voices)):
            voice = self.voices[i]
            if voice.active and voice.note_id == note_id:
                voice.note_off()
    
    def get_note_id(self, note: int, chan: int):
        """
        Generate an id for a given note and channel
        By hashing the note and channel we can ensure that we are turning off the exact note
        that was turned on
        """
        note_id = hash(f"{note}{chan}")
        return note_id

    def set_gain_a(self, gain):
        for voice in self.voices:
            gain_a_components = voice.signal_chain.get_components_by_control_tag("gain_a")
            for gain_a in gain_a_components:
                gain_a.amp = gain

    def set_gain_b(self, gain):
        for voice in self.voices:
            gain_b_components = voice.signal_chain.get_components_by_control_tag("gain_b")
            for gain_b in gain_b_components:
                gain_b.amp = gain

    def set_lpf_cutoff(self, cutoff):
        for voice in self.voices:
            lpf_components = voice.signal_chain.get_components_by_control_tag("lpf")
            for lpf in lpf_components:
                lpf.cutoff_frequency = cutoff

    def set_delay_time(self, time):
        for voice in self.voices:
            delay_components = voice.signal_chain.get_components_by_control_tag("delay")
            for delay in delay_components:
                delay.delay_time = time

    def set_delay_wet_gain(self, gain):
        for voice in self.voices:
            delay_components = voice.signal_chain.get_components_by_control_tag("delay")
            for delay in delay_components:
                delay.wet_gain = gain

Go ahead and try out the new components!

The beginning of the end

That officially wraps up the new code for this tutorial series. So where do we go from here? If you're interested in taking the project further, there are plenty of ways you can do that.

Before we get into specific suggestions, let's talk about the fact that there are some clear shortcomings of our project. We built a proof-of-concept synth. That means some aspects of the design or implementation are not fully realized. For example, the way we construct the tree of Components is quite fragile. It requires us to edit code if we want to change the structure of the tree, a process that is error prone and which lacks validation. There are quirks like needing to trigger a MIDI message to quit the program. If you followed closely, you may have noticed there are some magic numbers relating to gain.

These things are all fixable, and I encourage you to do so, especially if they bother you. There is also a ton of potential functionality that could be added to the synthesizer. You could add more oscillator types, more FX components, pitch bend, etc. That said, let's go over a few suggestions that come to mind right away.

Add an ADSR envelope. ADSR stands for Attack, Decay, Sustain, Release. An ADSR envelope is a type of signal modulator that controls the volume of a signal over time. It's used to sculpt the sound by dynamically changing the volume. Think of the way a violin note can swell in volume over time until it reaches its peak intensity. Now compare that to the same note being plucked on a guitar. The note reaches full volume almost at once. The violin has a long attack and the guitar has a short attack. For an analog instrument, the musician can control these sound parameters in the way they strike, pluck, bow, or otherwise actuate their instrument. For an electronic instrument, we can adjust these parameters directly. Building an ADSR envelope is a great exercise and can add a lot of character to the synth sound.
Improve the MIDI message parsing. The parser in the synthesizer thread could be improved. Right now, it expects the messages from the MIDI listener to be constructed with the parameters in an exact order, but there is no reason it needs to be this way.
Add global FX. The delay and low-pass filter FX are both per voice currently. I think it would be fun to experiment with mixing the voice outputs first, then applying FX to them to get different sounds.
Improve the shutdown process. Right now, you have to shut down the synth by sending ctrl-c, then hitting a key on the MIDI keyboard. It would be ideal to come up with a shutdown flow that doesn't require the MIDI key press.
Make the signal tree more robust. We're essentially constructing the signal tree by hand and promising that it works currently. There could be validation to make sure, for example, that all of the leaf nodes are descendents of the Generator class. Additionally, we could come up with a better way of constructing the tree in the first place. We might specify the structure in a settings file and read it in to a builder class, for example.

If you've followed along this far, I'm sure you've notifced your fair share of quirks and flaws, so I won't make a comprehensive list. But you get the idea. There is plenty that can still be done.

Code up to this point.

The end

In this tutorial series we built a working proof-of-concept synthesizer with Python. Along the way we explored various synthesis and programming topics, and wrote quite a lot of code. We built a stream player, a MIDI listener, a synthesizer and more. If you made it this far, I hope you've learned something new or useful.

Goodbyes are tough, so I'll keep this short. I just want to say I hope you enjoyed following along, and that you got anything at all out of it. Thank you from the bottom of my heart.

Happy coding!