Introducing Transfix

Transcribing speech into text is an essential tool with numerous benefits. Whether you’re a journalist interviewing a source, a student capturing a lecture, or a professional recording a meeting, accurate transcripts save you time and effort. However, existing cloud-based solutions can be costly and require you to upload audio files – a process that is slow and raises security and privacy concerns.

Today, we’re excited to introduce “Transfix” – a free web-based tool that lets you transcribe speech directly in your browser. No more file uploads or expensive subscriptions! Not only that, but its novel interface simplifies the process of correcting any transcription errors. We hope this user-friendly and privacy-preserving tool will empower everyone to unlock the benefits of transcription.

How it works

You can get started by pointing Transfix at an existing media file, or by recording speech live. It supports any video or audio file that you can play in your browser, and any audio recording device on your computer.

The speech is transcribed in real-time, limited only by your device’s speed and memory. As the transcription progresses, results appear immediately, allowing you to start correcting errors right away.

Transfix offers four methods to correct mistakes. If the correct word is listed as an alternative, you can just click the word to switch to it. If the correct word isn’t listed, simply click on it, type the right word in and press enter. If the word isn’t actually present in the speech, you can click the button above to skip over it. And if there is a word missing, press the plus button and type it in. When you’ve finished, the button in the top-right will copy your corrected transcript into your clipboard, which you can then paste wherever you need it.

How we built it

A standout feature of Transfix is that all your data remains local – nothing is sent to the cloud, ensuring total privacy. This unlocks transcription for people who value their privacy, and for those handling sensitive information, such as medical professionals and journalists. We achieved this by using a lightweight speech-to-text engine called Vosk. By compiling it into WebAssembly, we are able to run it efficiently within a web browser.

The speech-to-text engine is a traditional “hybrid” system, which uses acoustic and language models to analyse the speech noises and map them into sequences of words. This works by producing a “lattice” of the various possible words that could have been spoken, and the all possible paths through those words. Usually, only the most likely (or “best-fit”) path is presented to the user. However, this misses an opportunity to help the user correct mistakes by revealing the other words and paths.

Lattices are large, highly-interconnected graphs, which are pretty intimidating to look at. We needed to design a way to display these to the user in an intuitive and friendly way. We experimented with lots of graph layout algorithms to find an arrangement where the different paths are stacked on top of each other in alignment with the speech. We used Cytoscape.js and elk.js to implement the processing, and wrote some custom code to group the arranged graph into logical blocks.

Finally, we wrapped all of this together into a fast and responsive web interface using modern tools like Svelte, Vite, and DaisyUI.

How you can help

Transfix is still evolving, and we really value your input. Please try it out and let us know what you think. What features would you like to see added? Did you run into any problems? Your feedback will help us to improve Transfix and make it the best it can be.

How it works

How we built it

How you can help

You reached the end!