StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks

StyleWaveGAN is a style based drum sound generator that is a variation of StyleGAN, a state-of-the-art image generator by Karras et al. By conditioning StyleWaveGAN on both the type of drum and several audio descriptors, we are able to synthesize waveforms faster than real-time on a GPU directly in CD quality up to a duration of 1.5s while retaining a great amount of control over the generation. We also introduce an alternative to the progressive growing of GANs and experimented on the effect of dataset balancing for generative tasks. The experiments are carried out on an augmented subset of a publicly available dataset comprised of different drums and cymbals.

Tensorflow-compatible AudioCommons descriptors

Our Tensorflow-compatible descriptors are available here

Data and augmented samples

Drum Type Original Lowest augmentation parameters Highest augmentation parameters
Kick
Snare
Tom
Closed Hi-Hat
Open Hi-Hat

Synthesized samples

Drum Type Samples
Kick
Snare
Tom
Closed Hi-Hat
Open Hi-Hat

Descriptor effect : brightness

Brightness (%) Samples
40
50
60
70
80