Automatically transcribe new audio files with Deepgram's Speech-to-Text API.
Speaker 0: Deepgram offers a speech to text API that allows us to send off audio files and receive transcriptions in return. Today in QuickConnect, we're going to connect to Deepgram. So whenever we upload a file to our director's project, if it's an audio file, it will automatically go off, generate a transcription, and then save that transcription to the file description. So let's get started. The first thing we'll need is a Deepgram API key.
So head to the Deepgram console and create a new API key. You can give it any name you want and set the permissions. We only require the lowest level member permissions. We'll set the API key to never expire and hit create key. I'm gonna take note of this API key for later because I only get to see it right now one time.
If I lose this key, I'll need to come and generate a new one here in the Deepgram console. So hit got it and head over to your directors project. Let's create a new flow in our directors project. I'll call this one transcribe new audio files. We're going to set this up to trigger whenever an event happens in our directors project.
We're going to pick non blocking, which means that all of directors' built in functionality will not be stopped or halted or paused, but this logic will run-in parallel. Now in terms of scope, we're going to pick files dot upload. So this will trigger every time a file is uploaded, and we will hit save. So that is our trigger. Now the first thing we're going to do is actually make sure that we only continue this logic if the file is an audio file.
Because right now, this flow will start on every single file upload regardless of file type and location. So let's create a new operation. I'll just call it check and add some conditional logic. This condition ensures that the file type contains the word audio. The file type could be something like audio /mp3audio.wav and so on.
But they all start with audio. So we're gonna check that the file type contains the word audio. Now there are 2 paths here, the resolve path and the reject path. Now if this condition is not true, I e, the file is not an audio file, we will go down the reject path, and we're just gonna add nothing there, which means the whole flow just just stops. It completes at this point.
Let's create a new operation on the resolve track. And in here, we're going to call it Deepgram. We're gonna go ahead and make this a webhook slash request URL, operation, and this provides some additional options. Now in the Deepgram documentation, they give you a URL to use when you want to go ahead and create transcriptions. It needs to be a post request to api.deepgram.com/vone/listen.
And then you can add any number of query parameters to change what Deepgram is going to do and return with that audio file. So here we're saying, basically, make it human readable, add some add some formatting, and diarization will split out who's speaking. So it will tell us whether it's speaker 1 or speaker 2 and so on. Now Deepgram also requires a header in order to authenticate. So we're gonna go ahead and add a new header.
We're going to call this one authorization, capital a, and the value will be the word token, space, and then your Deepgram API key, and hit save. Now the final thing we need to do is actually give it the file URL. So what we're actually gonna do is pause here for just a moment, and we're going to try and trigger this flow as it stands. This will fail because we're not providing the file, but we're gonna see what this does. So I'm gonna go ahead and add a new file to our file library.
Here we go. So we'll upload that file. Fantastic we will head back to our flow and we'll see here that there's a little one icon here one run and we'll get to see. So we get a trigger. So this is the actual file being uploaded.
Now there's an interesting thing here. We see inside of this trigger value, there is a key. This is the unique identifier for this file in the director system. Then we see here that this was the check, and then we did a Deepgram API call with our header. And the returned payload, as we expected, is incorrect because we didn't actually send the file.
The reason I wanted to do this is because I wanted to show you that we need this ID trigger dot key. So now let's go in and edit this, this request over here. I'm I'm gonna just paste it in, and we're gonna explain it here. So every asset in Directus is accessible with the full URL of the Directus project /assets/thekey, the unique ID for that file. So this is now the URL for that file.
If I go to this complete URL, of course, putting the real key in here, I would actually get that file. However, there is one extra thing to note here, which is that right now as it stands, this directors project does not make all of the files public. Right now, if we continue with this request and we run that again, we reupload a file, there'll be an authentication error. Directus won't be able to reach that file. Then we have a few options.
The first is to go into the public role, go into directors files, and make that file or the directory which the file lives in public. That's one viable option. But there is another way to do this, which perhaps is a little more recommended. Let me go into my user account here and generate an API token. So I will save that and hit save.
Going back to my flow here, I'm gonna once again edit this URL, and I'm going to append to the end of it access_token, and then my API key. This will authenticate the API request. So this now is a file that I can access. Let's trigger this flow again. Let's go over to our file library.
I'm actually just gonna go ahead and re upload the file again. So it's a fresh file. So we'll have a duplicate here, but that will trigger the flow to rerun. Let's look at this second run of the flow. The returned payload comes back and there is our transcript.
That's fantastic. So we get that back. So our Deepgram request was successful. There's actually quite a large payload that comes back here. Data, results, channels, which is an array of objects, alternatives, which is an array, and there is our transcript.
Because we turned on smart formatting, there is also a ton of extra metadata that comes back. Like, it's huge. Every single word, every single word there. And we also get our paragraphs coming back with a transcript and so on and so forth. So you can go and explore this returned object.
I, surprise, surprise, have already done that work for us. The final step is going to be to actually save this value back to the file description. So let's create a new operation here, and let's make this an update data, operation. In the collection field, we are going to update a director's files item. And we're going to put in here the ID of that initial item trigger dot key.
So now we're gonna be updating the item that actually triggered this whole flow in the first place and in turn was transcribed. Now the final thing we're gonna do here is we are going to update the description. So let's go ahead and do that here together. The description, which is a built in field, is going to be equal to Deepgram and the reason we can say Deepgram is because that is the key of this step it's called Deepgram Deepgram dot data dot results dot channels. That was an array Dot alternatives.
That was an array.paragraphs dot transcript. Oh, rolls right off the tongue that. And, of course, that might be a little bit of development trial and error, but I know that that is the location of the formatted transcript. Let's hit save. Let's hit save again.
And now let's go and upload once again a new file. So that's uploading. Now in the background, that flow has already been triggered. Deepgram has or the condition has already been checked. Deepgram has already been called.
And in theory, if we go in here now, we should see in the description did I pick the wrong file? Yes. I did. It was this one. The description with speaker diarization and formatting.
So, automatically, now that this flow is set up and enabled, I can go hands off and know that every audio file that gets uploaded will be transcribed. Now at the time of recording, Deepgram do also accept video files as well, but you can expect that they'll take a little bit longer to return. And, typically, you should be trying to send us as little, you know, data as possible. So audio files was our condition, but you can widen that condition as well. So let's just sum up the way that that flow looks.
Transcribe new audio files. This is triggered whenever a new file is uploaded. And just to be clear, that's every file. So what we do is we check that it's an audio file, then we go off to the Deepgram API authenticating with our API key. And then once we get the transcript back, we update this file with the transcript.
Now there are some extra things just to consider here. Right now, this will happen on every audio file. You can add further conditions to perhaps specify down to a folder or only allow specific users to do this, or you can make it a manual step. You can set up a flow with a button on an item page and go ahead then and only generate a transcript on demand. I hope you found this interesting, and I'll see you in the next episode.