A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

modal-labs Last update: Nov 12, 2023

QuiLLMan: Voice Chat with LLMs

A complete chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

This repo is meant to serve as a starting point for your own language model-based apps, as well as a playground for experimentation. Contributions are welcome and encouraged!

The language model used is Vicuna, and we're planning on adding support for more models soon (requests and contributions welcome). OpenAI Whisper is used for transcription, and Metavoice Tortoise TTS is used for text-to-speech. The entire app, including the frontend, is made to be deployed serverlessly on Modal.

You can find the demo live here.

[Note: this code is provided for illustration only; please remember to check the license before using any model for commercial purposes.]

File structure

React frontend (src/frontend/)
FastAPI server (src/app.py)
Whisper transcription module (src/transcriber.py)
Tortoise text-to-speech module (src/tts.py)
Vicuna language model module (src/llm_vicuna.py)

Read the accompanying docs for a detailed look at each of these components.

Developing locally

Requirements

modal-client installed in your current Python virtual environment (pip install modal-client)
A Modal account
A Modal token set up in your environment (modal token new)

To serve the app on Modal, run this command from the root directory of this repo:

modal serve src.app

In the terminal output, you'll find a URL that you can visit to use your app. While the modal serve process is running, changes to any of the project files will be automatically applied. Ctrl+C will stop the app.

Once you're happy with your changes, deploy your app:

modal deploy src.app

[Note that leaving the app deployed on Modal doesn't cost you anything! Modal apps are serverless and scale to 0 when not in use.]

A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

QuiLLMan: Voice Chat with LLMs

File structure

Developing locally

Requirements

Django and Wagtail based blogging / podcasting app

A small library for playing audio files in python, with essential playback functionality.

An audio filter bank implementation in Python, contains ERB and linear filter banks

A library for reading and, in the future, writing audio metadata. https://audio-metadata.readthedocs.io/

This project demonstrates the use of Alexa Audio Player for skills, using the ASK Python SDK

Video to audio converter microservices application in Python

A bot for music streaming to TeamTalk Servers.

Pythonic access to audio files

Experimenting with Python and librosa to do Audio Event Detection

🎚️ Simple Matchering 2.0 Command Line Application

A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

QuiLLMan: Voice Chat with LLMs

File structure

Developing locally

Requirements

Develop on Modal

Deploy to Modal

Django and Wagtail based blogging / podcasting app

A small library for playing audio files in python, with essential playback functionality.

An audio filter bank implementation in Python, contains ERB and linear filter banks

A library for reading and, in the future, writing audio metadata. https://audio-metadata.readthedocs.io/

This project demonstrates the use of Alexa Audio Player for skills, using the ASK Python SDK

Video to audio converter microservices application in Python

A bot for music streaming to TeamTalk Servers.

Pythonic access to audio files

Experimenting with Python and librosa to do Audio Event Detection

🎚️ Simple Matchering 2.0 Command Line Application