Most speech recognition apps don’t have any bother transcribing a local speaker being recorded with a professional microphone in a quiet room. This isn’t a problem.
So to check them extra totally, I created a nightmare recording of two non-native audio system with loud metropolis background noise.
How did they fare?
Let’s discover out.

Otter was some of the continuously talked about options after we requested for ideas on Twitter and within the Ahrefs neighborhood. And for good cause. It’s simple to arrange, has an intuitive interface, and gives clear pricing.
Distinctive options
What stands out from the remainder is the app’s potential to report on-line conferences and transcribe them—just by pasting the assembly URL. However you too can import a video/audio file or report audio proper within the app.
Moreover, you’ll be able to join your calendar to by no means miss a gathering.
Transcript high quality
I obtained first rate outcomes, however there was so much to edit too.
It didn’t get some names proper. However I can’t blame any device for not selecting up “Ahrefs” or “Tim Soulo” 100% of the time.

One factor I discovered is that after it notified the transcriptions have been prepared, it’d nonetheless do one thing within the background (alter time stamps, tag audio system, and so on.). Like a pupil nonetheless scribbling on a check paper whereas passing it to the trainer.
Pricing
You can begin free of charge and improve to a paid plan later. You may import as much as three recordsdata and report 290 minutes of conferences earlier than it is advisable improve (as of April 2023).

Establishing an account was a no brainer. I discovered the interface simple to navigate as effectively. One private comment is that it felt a bit too “chilly” to make use of since I noticed issues like “Place Order,” “Billing,” and “Bill” manner too usually.
You would possibly get an impression that it was designed by an accounting workforce (versus Descript that comes subsequent on this roundup).
Distinctive options
Moreover auto-generated transcripts, Rev gives stay captions for Zoom conferences. You even have the choice to put an order for human transcriptions.
Transcript high quality
Poor audio with metropolis noise was a bit an excessive amount of for Rev. Some phrases have been lacking, whereas others have been misrecognized. Because of this, some paragraphs didn’t make a lot sense, whereas others have been positive.

Pricing
You may transcribe the primary audio file (as much as 45 minutes) free of charge. I obtained a invoice for $1.25 with a reduction that resulted in a complete of $0.00. Thanks, accounting workforce. 😉
Rev additionally has a 14-day trial of its paid plan. However that was tough to seek out. To find it, it is advisable go to the footer of the homepage and search for it beneath “Providers.”


Descript welcomed me by title (which was a pleasant coincidence). The principle factor you must know is that it’s a standalone software program moderately than an online service. It’s way more than a speech-to-text converter. It’s mainly a video modifying device. And there’s undoubtedly a studying curve. However fortunately, onboarding is extraordinarily humorous and interesting.

Distinctive options
As I discussed, Descript is extra of a video modifying device that’s good with transcribing. I’d name it “Canva for video/captions.” You may add B-rolls, results, animations, and extra.
You may simply drag and drop and mainly produce an entire video with its assist. However in case you simply want a transcript or captions of a video or audio, you are able to do that too.
Transcript high quality
My pattern audio had fairly muddy outcomes. At occasions, it had issue recognizing abbreviations (e.g., web optimization). I additionally had an issue with eradicating filler phrases like uh and um.
I discovered that if I didn’t select an choice to take away them, they, um, simply stayed there regardless that I didn’t want them more often than not. But when I did select to take away them, it sometimes ate up components of different phrases, inflicting much more bother.
Additionally, it couldn’t acknowledge components {that a} human being would don’t have any drawback understanding simply from context, e.g., “Jack of all trades” turned ‘“jackal, trades.”
On the intense facet, I imagine you’ll be able to nonetheless perceive what the textual content is about.

Pricing
You can begin with primary features free of charge and improve if wanted.

MacWhisper is a transcription device powered by Whisper. It’s an computerized speech recognition (ASR) system developed by OpenAI, the identical firm that introduced us ChatGPT.
As OpenAI states on its web site:
Whisper is skilled on 680,000 hours of multilingual and multitask supervised knowledge collected from the internet.
Whisper shouldn’t be one thing you’ll be able to merely “run” as is. What’s extra, it’s fairly difficult to arrange in case you do need to run it your self. Github, Python—you get the gist.
Fortunately, there are instruments like MacWhisper that take this off your shoulders and allow you to use the facility of AI in a easy person interface.
Distinctive options
Simply plain speech-to-text recognition with time stamps. Sadly, it doesn’t auto-tag the audio system.
Transcript high quality
Once you run the device, you must select a “mannequin” to work with. Mainly, the lighter the mannequin, the faster it’ll run. However bigger fashions will produce higher outcomes. Additionally, in MacWhisper, these bigger (higher however slower) fashions are solely out there within the paid model.
I made a decision to start out with the free “small” mannequin, which was said to have “regular pace with good accuracy.”
It was OK, however no higher than the opponents. I assumed it might work positive with high-quality audio, however not with the horrible examples I fed to it.
“AI is overrated,” I believed. However earlier than closing the Mac and switching again to my pricey Home windows PC, I made a decision to provide the “giant” mannequin a strive.
And you recognize what, AI shouldn’t be overrated. I discovered the outcomes to be a lot better than the rest.
The transcript was actually, actually good. It even obtained issues like “Ahrefs” and “SaaS” proper! Although nonetheless not 100% of the time.

Pricing
You may run smaller fashions free of charge. For a big mannequin, you’ll have to buy a license.

This device is the best to make use of. Merely drag and drop your file—then it’s prepared. It takes a while to course of, although.
Distinctive options
Nothing in addition to downloading a transcription.
Transcript high quality
My first impression was that the outcomes have been good as a result of, visually, it delivered a confident-looking textual content:

However after proofreading, I noticed that it merely didn’t embody the components it failed to acknowledge—generally a number of phrases in a row.
Pricing
It’s free to use.

Premiere Professional shouldn’t be precisely a “transcription device” however moderately a video modifying software program. I’m together with it as a result of I assume that some corporations could have already got it of their arsenal (like we do).
To get to the transcription characteristic in Premiere Professional, simply go to the “Captions and graphics” workspace and click on “Create transcription.”

Distinctive options
If we take solely speech recognition under consideration right here, what it does effectively is creating exact time stamps, auto-tagging the audio system and, if wanted, robotically including an editable captions monitor to a video challenge.
Transcript high quality
Let’s be easy: I discovered the noisy audio transcript to be a failure. I couldn’t comprehend what individuals have been speaking about within the first place.

Nonetheless, I feel this characteristic may be actually useful in case you are creating captions from high-quality audio. I used it myself a number of occasions and had nothing to complain about when the recording high quality was good.
Pricing
You want an Adobe Inventive Cloud subscription to make use of Premiere Professional.

Whereas signing up and importing recordsdata is moderately easy, you must spend a while answering questions on you and your organization earlier than you’ll be able to lastly get to the device itself. And no, you’ll be able to’t skip typing in your organization title, your function, and your organization measurement.
However when you get via this, the interface is clear and intuitive.
Distinctive options
You may generate a transcript or captions for video or audio. There may be additionally an choice to request a handbook overview of the transcript. Alternatively, you’ll be able to generate subtitles in a special language, so you will have transcription and translation in a single click on.

Transcript high quality
Joyful Scribe did a very good job transcribing the audio. It had no drawback with phrases like “web optimization” and “SaaS” (clearly the weakest level for a lot of instruments). It may additionally auto-tag the audio system, which is likely to be useful in sure conditions.

Pricing
I may check one file free of charge. After that, I would want to purchase credit for use for every minute of video or audio transcribed.

Sonix is a device for computerized transcriptions, translations, and integration with assembly apps.
Distinctive options
Moreover conferences integration, which is nearly a given for many instruments, AI abstract era is an fascinating characteristic (in beta as of April 2023.) However I already obtained spectacular outcomes from it.

You additionally get some further instruments to work with video captions—a timeline view and an possibility to separate captions into a number of strains. You may also import an present transcript, and Sonix will sync it with the audio.
Transcript high quality
Sonix has a customized vocabulary characteristic. I discovered that helped a bit with names like “Tim Soulo” and “Ahrefs,” but it surely didn’t work 100% of the time. It principally did effectively. However at occasions, it mistook web optimization for CEO and returned the phrase “Excel” seemingly out of nowhere.
The transcript made sense generally however required numerous edits if it wanted to be good.

Pricing
Sonix has a free trial for 25 minutes of transcriptions. After that, it is advisable buy pay-as-you-go credit or get a subscription.

Notta is one more transcription service that works for each real-time conferences and present recordings.
Distinctive options
Moreover transcription, Notta focuses on streamlining sure workflows and gives options similar to calendar sync and scheduler (in beta as of April 2023).
Transcript high quality
Background noise and poor audio high quality weren’t deal breakers for Notta. The transcription outcomes turned out principally OK however nonetheless had some issues.

Sentence construction was generally a bit bizarre, sure phrases went lacking, and my favourite “Jack of all trades” half wasn’t that neat this time.

One other factor value noting is that, for some cause, it failed to acknowledge two audio system, and the entire interview was tagged as “Speaker 1.”
Pricing
You can begin with a free primary subscription and take a look at a three-day trial of the paid plan, Notta Professional.
Remaining ideas
As you’ll be able to see, there are many instruments to select from. Nonetheless, plainly OpenAI stirred issues up a bit by releasing a free ASR (computerized speech recognition) system, which I discovered to be significantly extra succesful than others.
However pure speech recognition high quality is only one issue. Perhaps you do have to report your Zoom conferences (Otter), work with captions in a big video challenge (Premiere Professional), or rapidly create a Canva-style video (Descript).
Additionally, I have to stress that I used to be making an attempt to push these instruments to the sting by giving them the worst-case state of affairs recording. For extra pure makes use of, the variations within the consequence is likely to be a lot much less noticeable.
It’s nice to see that there are such a lot of choices on the market, and I hope this overview will assist a bit find the one that’s good for you.
Acquired questions? Ping me on Twitter.