Dataset Generator for Musical Devices

The Dataset Generator for Musical Devices (DGMD) is a collection of MAX patches and JavaScript files for the automatic generation of datasets from hardware and software musical devices. The musical device could be a device that generates sound, such as a sound synthesizer, or it could be a device that processes sound, like an audio effect. This tool was developed primarily to generate datasets for modeling musical devices utilizing data-driven black box techniques such as deep learning, or to utilize machine learning for implementing control abstraction for device’s parameters.

The DGMD supports different type of devices:

Software plugins in VST or AU format
Hardware devices with a digital MIDI interface
Hardware devices with an analog CV-Gate interface
Hardware devices without an interface to control the synthesis/processing parameters

The DGMD triggers the device with a stimulus (an audio signal for effects, single notes or chords for synthesizers) while it record its output for different settings of the control parameters. The generated dataset is composed by a CSV file with the parameter values, and a collection of audio files including the recording of the device's output.

To generate the dataset, up to 10 synthesis/effects parameters and 5 note parameters (pitch, velocity, duration, pitch bend, and aftertouch) can be changed according to one of the following modalities:

Step - user set minimum, maximum and step size for each parameter, and the dataset is generated for all possible parameter combinations.
Random - user set minimum, maximum and quantization step for each parameter, and the dataset is generated drawing a user-defined number of parameter combinations from a set of independent uniform distributions, each corresponding to a different parameter.
Sweep - parameters continuously change driven by a set of independent triangular oscillators with user-defined frequencies and ranges (excluding note’s pitch, velocity and duration), and the dataset is generated for a user-defined number of triggered notes or overall duration (the varying parameters are saved into multichannel audio files instead of a single CSV file).
Manual - parameter combinations are manually set by the user, one at at time.

The DGMD, present a variety of utilities and functionalities to facilitate the dataset generation and the follow up use for modeling purposes, such as detailed timing control of note recording to capture desired portion of the ADSR phases, support for dataset generation using chords, or repeated instances with polyphonic patterns, or automatic detection and compensation of the device’s latency.

There are two version of the DGMD:

For more information, read the associated papers:

S. Fasciani, R. Simionato, A.Tidemann "A Universal Tool for Generating Datasets from Audio Effects", in Proceedings of the 21st Sound and Music Computing Conference, 2024, Porto, Portugal.

S. Fasciani "A Universal Tool for Generating Datasets from Sound Synthesizers", in Proceedings of the 22nd Sound and Music Computing Conference, 2025, Graz, Austria.

Year

2025

Location

Oslo, Norway

Links

https://github.com/stefanofasciani/DGMD

https://stefanofasciani.com/

Author

Stefano Fasciani

Keywords

audio effects sound synthesizers dataset machine learning