Python preamble

Packages to install from PyPI: numpy, scipy, matplotlib, soundfile.

1
2
3
4
5
6
7
8
import numpy as np
from scipy.signal import ShortTimeFFT, find_peaks
from scipy.signal.windows import hann
from scipy.optimize import fsolve
from scipy.stats import linregress
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import soundfile

There is a nice song by 影虎。called Time to beat the odds:

At 1:18, there is an interesting buildup kickroll. This effect is created by mixing a steady 32nd note kickroll with a gradually-accelerating buildup kickroll. The gradually-accelerating part is not quantized to dyadic note values, so it is pretty hard to analyze by ear.

One can obtain a clearer sample of this kickroll by consulting the file Dr_KICKroll00.ogg from the BMS. The waveform of the sample looks like this:

Python code
1
2
3
4
5
6
7
8
9
10
11
12
samples, fs = soundfile.read("Kagetora_Time-to-beat-the-odds_bofet/Dr_KICKroll00.ogg")
fs /= 1000 # Use ms and kHz
if samples.ndim == 2:
	samples = samples.mean(axis=1)
N = len(samples)

plt.figure(figsize=(10, 6))
plt.plot(np.arange(N) / fs, samples, linewidth=0.5)
plt.xlabel('Time (ms)')
plt.ylabel('Amplitude')
plt.xlim(0, N/fs)
plt.ylim(-1, 1)
Waveform plot.

There is clearly a repeating pattern that is occurring at an gradually-increasing frequency. Each occurrence of the pattern is a kick. The task is to find the regularity of the kicks.

We can see that, at the start of each kick, the waveform is oscillating more rapidly than in later times of the kick. Therefore, a natural idea is to capture this high-frequency oscillation feature using the short-time Fourier transform (STFT), and we would expect to see some peaks at the high-frequency region of the spectrogram of the waveform at each kick occurrence. SciPy provides a convenient function for STFT, and then we can plot the spectrogram of the waveform:

Python code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
hop = 32
SFT = ShortTimeFFT(hann(hop), hop, fs)
spectrogram = SFT.spectrogram(samples)

plt.figure(figsize=(10, 6))
im = plt.imshow(
	spectrogram,
	aspect='auto',
	origin='lower',
	extent=SFT.extent(N),
	norm=LogNorm(vmin=0.001, vmax=np.max(spectrogram)),
)
plt.xlabel('Time (ms)')
plt.ylabel('Frequency (kHz)')
plt.colorbar(im, label='Magnitude')
Spectrogram of the kick roll sample.

We can see that there are clearly some peaks at the high-frequency region of the spectrogram. Pick out the peaks:

Python code
1
2
3
4
5
6
7
8
plt.figure(figsize=(10, 6))
data = spectrogram[14:, :].mean(axis=0)
peaks, peak_properties = find_peaks(data, height=0.004, distance=7)
peaks = peaks * hop / fs
plt.xlabel('Time (ms)')
plt.ylabel('Average magnitude for high frequencies')
plt.plot(SFT.t(N), data)
plt.plot(peaks, peak_properties['peak_heights'], "o")
Each peak is approximately at the start of a kick.

To check that those peaks make sense, we can mark the positions of the peaks on the waveform to see if they are indeed approximately at the start of each kick:

Python code
1
2
3
4
5
6
7
8
plt.figure(figsize=(10, 6))
plt.plot(np.arange(N) / fs, samples)
for peak in peaks:
	plt.axvline(peak, color='red')
plt.xlabel('Time (ms)')
plt.ylabel('Amplitude')
plt.xlim(0, 200)
plt.ylim(-1, 1)
Waveform with high-frequency spectrogram peaks marked.

We can see that although they are not perfectly at the start of each kick, the peaks are indeed approximately at the start of each kick. Now, plot the time intervals between each pair of consecutive peaks. If the plot is in log-scale, one can see that they approximately form a straight line, which means that the time intervals decay geometrically. We can use a simple linear regression to fit the data.

Python code
1
2
3
4
5
6
7
8
9
fig, ax = plt.subplots(figsize=(10, 6))
ax.set_yscale('log')
intervals = np.diff(peaks)
indices = np.arange(len(intervals))
lin_res = linregress(indices, np.log(intervals))
ax.plot(indices, intervals, 'o')
ax.plot(indices, np.exp(lin_res.intercept + lin_res.slope * indices))
ax.set_xlabel('Kick')
ax.set_ylabel('Interval (ms)')
Intervals between consecutive peaks.

With the linear regression, we can conclude that the decay ratio is 0.97620.9762.

If we had guessed that the time intervals decay geometrically and that the first kick has the length of a 32nd note, we could calculate the decay ratio aa by solving the equation 18(1a59)=4(1a),\fr18\p{1-a^{59}}=4\p{1-a}, where 1/81/8 is the note value of a 32nd note, and 44 is the total note value of this kickroll (a measure with 44 quarter notes), and 5959 is the total number of kicks (obtained by counting the patterns in the waveform). Solving this equation numerically gives us a=0.9764a=0.9764, which is pretty close to the value we got before.