Speech Recognition

The speech recognition in TwistedWave makes it much easier to work with voice recordings, and is especially useful when working with long recordings such as audiobooks.

Recognize speech in the audio file

In just one click, TwistedWave will recognize the speech in your audio file. It can also recognize the speech in real time while you are recording with TwistedWave.

Words above the waveform

TwistedWave will display the recognized text above the waveform. It allows you to quickly see what is being said in the audio without having to play it back.

Several Speech Recognition Engines

TwistedWave includes several speech recognition engines, so you can choose the one that best matches your language, your computer, and the accuracy you need.

  • NVIDIA's Parakeet Speech Recognition (Mac and Windows)

Parakeet, developed by NVIDIA, is an extremely fast and accurate speech recognition engine. An English-only model and a multilingual model are available, and the multilingual model automatically detects and transcribes speech in many languages.

On Mac, Parakeet runs on Apple Silicon (macOS 14 or later) and can also recognize speech in real time while you record.

  • OpenAI's Whisper Speech Recognition (Mac and Windows)

The Whisper speech recognition engine developed by OpenAI is also available in TwistedWave. Whisper is very accurate and can recognize speech in nearly 100 languages.

Whisper comes in a range of model sizes, from a small and fast model to a large, extremely accurate one, so you can balance speed and download size against accuracy. Dedicated English-only models are also available for the best results in English.

  • Apple's Speech Recognition (Mac only)

With the same engine that powers Siri, TwistedWave can recognize speech on-device in any language that your Mac supports for dictation — including English, French, German, Italian, Spanish, Portuguese, Chinese, Japanese, Korean, Arabic, Russian, Turkish, and many more. Because it is built into macOS, there is nothing extra to download.

  • Zipformer Real-Time Recognition (Windows only)

On Windows, TwistedWave offers a selection of Zipformer streaming models for real-time recognition in many languages, including English, Chinese, French, Spanish, German, Russian, Korean, and Japanese, as well as a multilingual model.

  • NVIDIA's Nemotron Speech Recognition (Windows only)

Nemotron is a streaming English speech recognition model from NVIDIA, available on Windows for real-time recognition. You can choose between three latency profiles, trading a little speed for higher accuracy.

On-device, downloadable models

Every engine runs locally on your computer, so speech recognition keeps working without an internet connection, and your recordings are never uploaded to a server.

Apart from Apple's built-in engine, each model is downloaded the first time you select it, and then kept on your computer for later use. Smaller models download quickly and run fast, while larger models are slower but more accurate, so you can pick the right balance for your recordings.


Real-time recognition while recording

TwistedWave can recognize speech in real time while you are recording. On Mac, Apple's engine and Parakeet transcribe your voice as you speak; on Windows, the Zipformer and Nemotron streaming models do the same. TwistedWave automatically scrolls the recognized text for you as you record.


Text script synchronization

Load a text script from an RTF or PDF file, or directly form the contents of your clipboard, and TwistedWave will synchronize it with the speech recognized in your audio file.

  • Select words, and TwistedWave selects the audio,
  • Select audio, and TwistedWave selects the words.