Sound Related Deep Learning Tasks boosting repository with pytorch

AppleHolic Last update: Apr 03, 2024

Pytorch Sound

Introduction

Pytorch Sound is a modeling toolkit that allows engineers to train custom models for sound related tasks. It focuses on removing repetitive patterns that builds deep learning pipelines to boost speed of related experiments.

import torch.nn as nn
from pytorch_sound.models import register_model, register_model_architecture


@register_model('my_model')
class Model(nn.Module):
...


@register_model_architecture('my_model', 'my_model_base')
def my_model_base():
    return {'hidden_dim': 256}

from pytorch_sound.models import build_model


# build model
model_name = 'my_model_base'
model = build_model(model_name)

Several dataset sources (preprocess, meta, general sound dataset)

LibriTTS, Maestro, VCTK and VoiceBank are prepared at now.

Freely suggest me a dataset or PR is welcome!

Abstract Training Process
- Build forward function (from data to loss, meta)
- Provide various logging type
  - Tensorboard, Console
  - scalar, plot, image, audio

import torch
from pytorch_sound.trainer import Trainer, LogType


class MyTrainer(Trainer):

    def forward(self, input: torch.tensor, target: torch.tensor, is_logging: bool):
        # forward model
        out = self.model(input)

        # calc your own loss
        loss = calc_loss(out, target)

        # build meta for logging
        meta = {
            'loss': (loss.item(), LogType.SCALAR),
            'out': (out[0], LogType.PLOT)
        }
        return loss, meta

English handler sources are brought from https://github.com/keithito/tacotron
- Add types
General sound settings and sources

Usage

Install

ffmpeg v4

$ sudo add-apt-repository ppa:jonathonf/ffmpeg-4
$ sudo apt update
$ sudo apt install ffmpeg
$ ffmpeg -version

install package

$ pip install -e .

Preprocess / Handling Meta

Download data files

In the LibriTTS case, checkout READMD

Run commands (If you want to change sound settings, Change settings.py)

$ python pytorch_sound/scripts/preprocess.py [libri_tts / vctk / voice_bank] in_dir out_dir

Checkout preprocessed data, meta files.

Maestro dataset is not required running preprocess code at now.

Examples

Source (Speech) Separation with audioset : https://github.com/AppleHolic/source_separation

Environment

Python > 3.6
pytorch 1.0
ubuntu 16.04

Components

Data and its meta file
Data Preprocess
General functions and modules in sound tasks
Abstract training process

To be updated soon

Preprocess docs in README.md
Add test codes and CI
Document website.

LICENSE

This repository is under BSD-2 clause license. Check out the LICENSE file.

Sound Related Deep Learning Tasks boosting repository with pytorch

Pytorch Sound

Introduction

Usage

Install

Preprocess / Handling Meta

Examples

Environment

Components

To be updated soon

LICENSE

TIKTOK API IN PYTHON , GET TRENDING VIDEO, GET VIDEO BY CHALLENGE, GET VIDEO BY MUSIC, GET VIDEO BY USER, DOWNLOAD VIDEO NO WATERMARK

Automatically jump-cut silent parts of your videos using Python

Annotations for the Sirajology Python NN Example. This code comes from a demo NN program from the YouTube video https://youtu.be/h3l4qz76JhQ. The program creates an neural network that simulates the exclusive OR function with two inputs and one output.

video editing and compositing with python and melt

Sample Python code for uploading video up to 140 seconds and/or up to 512Mb.

Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Command line tools for quick video editing.

Python extension to capture video with video4linux2 (fork of https://launchpad.net/python-v4l2-capture)

ACD helps you download Adobe Connect Sessions Videos and Audios, download files from FTP server, transfer files using Shift I/O

Automated Lip reading from real-time videos in tensorflow in python