A Python library and CLI for generating audio samples using Harmonai Dance Diffusion models.

sudosilico sudosilico Last update: Mar 29, 2024

sample-diffusion

A Python library and CLI for generating audio samples using Harmonai Dance Diffusion models.

🚧 This project is early in development. Expect breaking changes! 🚧

Features

  • A CLI for generating audio samples from the command line using Dance Diffusion models. (cli.py)
  • A script for reducing the file size of Dance Diffusion models by removing data that is only needed for training and not inference. (scripts/trim_model.py)

Installation

Requirements

  • git (to clone the repo)
  • conda (to set up the python environment)

conda can be installed through Anaconda or Miniconda. To run on an Apple Silicon device, you will need to use a conda installation that includes Apple Silicon support, such as Miniforge.

Cloning the repo

Clone the repo and cd into it:

git clone https://github.com/sudosilico/sample-diffusion
cd sample-diffusion

Setting up the conda environment

Create the conda environment:

# If you're not running on an Apple Silicon machine:
conda env create -f environment.yml

# For Apple Silicon machines:
conda env create -f environment-mac.yml

This may take a few minutes as it will install all the necessary Python dependencies so that they will be available to the CLI script.

Note: You must activate the dd conda environment after creating it. You can do this by running conda activate dd in the terminal. You will need to do this every time you open a new terminal window. Learn more about conda environments.

conda activate dd

Using the cli.py CLI

Generating samples

Make a models folder and place your model in models/DD/model.ckpt, then run the generator:

python cli.py

Alternatively, you can pass a custom model path as an argument instead of using the models/DD/model.ckpt default path:

python cli.py --model models/DD/some-other-model.ckpt

Your audio samples will then be in one of the following folders:

  • audio/Output/DD/{mode}/{seed}_{steps}

cli.py Command Line Arguments

argument type default desc
--argsfile str None Path to JSON file containing cli args. If used, other passed cli args are ignored.
--use_autocast bool True Use autocast.
--crop_offset int 0 The starting sample offset to crop input audio to. Use -1 for random cropping.
--device_accelerator str None Device of execution.
--device_offload str cpu Device to store models when not in use.
--model str models/dd/model.ckpt Path to the model checkpoint file to be used (default: models/dd/model.ckpt).
--sample_rate int 48000 The samplerate the model was trained on.
--chunk_size int 65536 The native chunk size of the model.
--mode RequestType Generation The mode of operation (Generation, Variation, Interpolation, Inpainting or Extension).
--seed int -1 (Random) The seed used for reproducable outputs. Leave empty for random seed.
--batch_size int 1 The maximal number of samples to be produced per batch.
--audio_source str None Path to the audio source.
--audio_target str None Path to the audio target (used for interpolations).
--mask str None Path to the mask tensor (used for inpainting).
--noise_level float 0.7 The noise level used for variations & interpolations.
--interpolations_linear int 1 The number of interpolations, even spacing.
--interpolations float or float[] None The interpolation positions.
--keep_start bool True Keep beginning of audio provided(only applies to mode Extension).
--tame bool True Decrease output by 3db, then clip.
--steps int 50 The number of steps for the sampler.
--sampler SamplerType IPLMS The sampler used for the diffusion model.
--sampler_args Json String {} Additional arguments of the DD sampler.
--schedule SchedulerType CrashSchedule The schedule used for the diffusion model.
--schedule_args Json String {} Additional arguments of the DD schedule.
--inpainting_args Json String {} Additional arguments for inpainting (currently unsupported)

Using args.json

Instead of specifying all the necessary arguments each time we encourage you to try using the args.json file provided with this library:

python cli.py --argsfile 'args.json'

To change any settings you can edit the args.json file.

Using the model trimming script

scripts/trim_model.py can be used to reduce the file size of Dance Diffusion models by removing data that is only needed for training and not inference. For our first models, this reduced the model size by about 75% (from 3.46 GB to 0.87 GB).

To use it, simply pass the path to the model you want to trim as an argument:

python scripts/trim_model.py models/model.ckpt

This will create a new model file at models/model_trim.ckpt.

Tags:

Subscribe to our newsletter