Subjective evaluation of sound quality and control of drum synthesis with StyleWaveGAN
In this paper we are presenting a study on the subjective evaluation of the sound quality of the proposed StyleWaveGAN as well as a subjective evaluation of the precision of the control using timbre descriptors form the Audio Commons toolbox. In the context of professional audio production, StyleWaveGAN is our contribution for fast and simple yet extensive drum generation: it synthesizes waveforms faster than real-time on a GPU directly in CD quality up to a duration of 1.5s while retaining a considerable amount of control over the generation. The simplicity of the control method comes from our differentiable implementation of high-level descriptors based on the AudioCommons models, allowing us for to control the synthesis with ease in terms of interpolation and latent separation when used in conjunction with StyleWaveGAN. We evaluate our control method with statistical metrics as well as measurement as well as measurement of psychophysical response to the variations of the control. We also perform perceptual tests to evaluate the sound quality of the generation against DrumGAN.
Data and augmented samples
Drum Type | Original | Lowest augmentation parameters | Highest augmentation parameters |
---|---|---|---|
Kick | |||
Snare | |||
Tom | |||
Closed Hi-Hat | |||
Open Hi-Hat |
Synthesized samples
Drum Type | Samples |
---|---|
Kick | |
Snare | |
Tom | |
Closed Hi-Hat | |
Open Hi-Hat |
Perceptive control samples
Descriptors | Samples |
---|---|
Brightness | Delta = 0.9 ; Delta = 6.7 |
Depth | Delta = 1.8; Delta = 8.9 |
Warmth | Delta = 7.5; Delta = 4.2 |
Psychophysical evaluation results
In this section, we are showing all the graphs of CDF obtained through our perceptive evaluation of control quality.
Descriptors | Q20 | Q50 | Q80 |
---|---|---|---|
Brightness (Combined) | |||
Brightness (Positive) | |||
Brightness (Negative) | |||
Brightness (Whole range) | |||
Depth (Combined) | |||
Depth (Positive) | |||
Depth (Negative) | |||
Depth (Whole range) | |||
Warmth (Combined) | |||
Warmth (Positive) | |||
Warmth (Negative) | |||
Warmth (Whole range) |