Rate This Episode

AI Speech Generation

Name: AI Speech Generation
Uploaded: 2024-05-16
Description: Generate realistic speech clips from text with this custom operation, powered by Genny from LOVO.

Directus AISeason 1Episode 6May 16 2024

Generate realistic speech clips from text with this custom operation, powered by Genny from LOVO.

Transcripts are automatically generated with AI and may contain errors.

Speaker 0: The Directus AI speech generation operation allows you to create realistic sounding speech from text inside of your director's project. This extension is powered by Jenny, so you will need an API key in order to follow along. Now here I have a collection called ads, and I've created just a very short textual ad that I want to run for directors. However, it's gonna go in something like a podcast or a audiobook or something, so it needs to be in audio form. So what we're gonna do is create a manual flow where when we click a button here, it will go off and generate that audio for us.

So let's go over to our flows and create a new flow, generate audio, with a manual trigger on the ads collection. This will return just the ID of the item where the button is pressed. We obviously want the text. So the first thing we're gonna do is read data and we will read data from the ads collection, specifically the ID from the from the item whether it was pressed. So trigger dot body dot keys zero.

So this will return the text and the currently empty file field. Then we can go straight away and send this off to the AI speech generation operation. Now we do need a Jenny API key, but I have one that I made earlier. And in here, we put in our text with a maximum of 500 characters. And I am gonna go ahead and just say, hey, that last operation that fetched all the data, go grab the text value out of it.

You get to pick one of the selected speakers, so we'll use Brian Lee junior here and you get to specify the speed at which the, the speech happens. So this will return just a URL of an mp3 on the web. Next, we need to import it into this Director's project, and we can do that using the webhook request URL operation. So we'll make a post request to this directus URL trigger dotlws.i. So that's your specific project URL/assets, not assets, sorry, slash files slash import.

That's the endpoint. It's a post request, and then the body here, we are going to just pass in the URL. Now just a note that for this project I've made file creation publicly accessible, you may need to lock down those permissions and provide your authentication headers here, but for ease I've turned that off for this demo. So what this will do is make that API call and return with the whole payload of the brand new director's files item that was just created, and what we wanna do is extract the ID and save that back to our ads item. So let's update the ads item here.

We wanna update trigger.body.keys 0, and what we wanna do is just update the file value, and the location of the ID in the returned JSON payload is in data dot data dot ID. That is the whole flow. We click the button. We grab the whole item, which includes the text. We generate the text, which results in an m p 3.

We import the m p 3 and save the m p 3 back to the item. So let's see if this works. Fingers crossed, we're now going and generating that audio. It's going off to that 3rd party vendor creating that realistic sounding audio for us, importing it, and then finally adding it here inside of the file except when it doesn't. Why is that?

Let's take a little look together. Let's take a look at the log. What happened there? Trigger, because I got completely the wrong URL of just my director's project here. Tunnel is actually where this project is located.

So let's run that one last time, generate audio. We'll give that just a moment once again to do its thing in that time I'll tell you a joke I'm not a funny person so I won't be doing that but that allows me to stall for long enough for this operation to complete and that of course is our, is our wav file, our actual audio file there containing our ad. So that is a little bit about how the, speech generation operation works using directus AI. We hope you enjoy it. I see loads of use cases for this, but the primary one being either like video voiceovers at scale or programmatic ads or stuff like that.

So I hope you found this interesting. I'll see you in the next video.