Join Kevin and Damien Murphy, Solutions Engineer at Deepgram, as they use Deepgram to build an audio podcast summarizer in Directus Automate.
Speaker 0: Hello. Hello. Hello. Hello. Damien, you are still muted, but we are we are here.
Hello. I'm Kevin.
Speaker 1: I'm Damien. Yeah.
Speaker 0: Nice to meet you. Yeah. For the next hour and a half, we're gonna be trying to get things to work maybe successfully. We'll see. We'll talk about the project in just a moment, but I actually thought some more thorough introductions might be in order.
Damien, would you like to tell us who you are and who you work
Speaker 1: for? Yeah. I'm Damien Murphy, applied engineer here at Deepgram. So, you know, working with customers, building, you know, real time low latency voice spots and transcribing their audio.
Speaker 0: Yeah. Excellent. And I am very, very, very fond of Deepgram, so I'm really excited and thankful that you're joining us for the next little bit. My name's Kevin. I work on the director's core team, and in this workshop or rather, this workshop is part of, Leap Week.
Hopefully, you are already aware, but Leap Week is our week of announcements where we announce new features and also run a series of other events to celebrate directors and our community. We're starting to near the end of the week now, but don't worry. There's still lots more to come. Tomorrow, we are doing a community networking social. And right now, right here, we're gonna be building some cool stuff with directors and Deepgram.
Maybe if we take a moment to talk about the project, that'd be a cool way to to stop. So podcasts. I love podcasts. Podcasts are actually all standards. Podcasts are just an RSS feed that contains some metadata and links to episodes.
And in this workshop, we're going to string together using Director's automate and flows, our kind of visual automation, tool, a, you know, semi complex automation where we are going to go grab a RSS feed of a podcast, go grab the latest episode, send it off to Deepgram's transcription service. So maybe before we I jump straight into the whole project, maybe we break down each part. Could you tell us a little bit about, Deepgram's transcription service?
Speaker 1: Yeah. So we're able to process, you know, audio, video, pretty much any format, and turn that into, text. Right? So we'll basically transcribe every bit of speech that's spoken and then give you back a word level and time stamp level, you know, what was spoken. We also have multiple other APIs, which we'll get into a little bit later.
But,
Speaker 0: I mean, we can we can rock on now. So we're going to go and transcribe these podcasts. I listened to one the other day that was like an hour long. Then we're gonna use this audio intelligence. Tell us about this one.
Speaker 1: Yeah. So we have the ability to pass the transcript once it's transcribed through our audio intelligence features. So this can do things like sentiment analysis, summarization, intent detection, and topic detection. And this can be really useful for, you know, pulling out that valuable metadata, and it's all time stamped as well. So you can even, you know, build an overview of the podcast, using those, audio intelligence features.
Speaker 0: Cool. And then you can also understand it on, like, a segment basis as well. Right?
Speaker 1: Yeah. Yeah. So each part of the the audio that comes through will pick up topics as they happen. So we can do major topics and, minor topics as well.
Speaker 0: Awesome. Oh, that's really interesting. Justine, a question here in chat. And, yes, please do use the chat. I will answer the question while encouraging you to use the chat.
Will this demo be available on demand? Yes. Like everything at Leap Week, it is all recorded. It will be available on DIRECTUS TV tomorrow. In fact, the workshop from yesterday with Twilio is already up in our brand new show called Enter the Workshop as you will be able to watch this on demand, of course.
So, but by being here live, you have access to the chat, so take advantage of it. I'll be monitoring it. You can ask either of us questions about Directus or Deepgram or what we're doing, and we'll be more than happy to answer in that. So we're gonna transcribe a latest podcast episode. We are going to use the audio intelligence, features that, Deepgram offers.
I'm gonna struggle because Directus in Deepgram both start with these. So sometimes I might do this. I feel myself maybe doing it already. And then finally, we will use, one of the newer Deepgram products, Text to Speech. Tell us about this one.
Speaker 1: Yeah. So we recently released, our text to speech. It's one of the the lowest latency text to speeches on the market, with high quality voices. So you can get a very low latency text to speech generated at a very low price point as well.
Speaker 0: Just to help me understand, because latency only, I suppose, matters well, it doesn't only matter, but it matters more when you're doing live, like, real time stuff. So you can use this real time as well?
Speaker 1: Yeah. Absolutely. And that's where we see a lot of the demand in the market is for, you know, building real time voice box with sub second latency. So with this text to speech, you can get about 250 milliseconds of of, latency for time to first byte.
Speaker 0: Excellent. We won't be using it real time today because obviously podcast episodes are already static hosted files, but that's, I suppose, where the latency matters. So you can do, like, true conversational voice bots, I suppose. Cool. So we're gonna do all of that.
Just to summarize how this is going to work, we are going to first build a flow that will take in a podcast URL. We will grab the latest podcast episode from that podcast feed. We will send that off to Deepgram to receive a transcript, then we're gonna send it off for, text intelligence, so text to text API that Deepgram offers. We'll talk a little bit about why they're separate when they don't have to be. You can do those 2 steps together, but it will become clear as we go through the workshop.
Then armed with a summary of that podcast, we are going to send it back off to Deepgram to generate a short summary, I suppose, in the audio bite, which we will then save back to the director's project so you can go and listen to it at your leisure. Any questions in the chat? Any thoughts, Damian, before we kick off?
Speaker 1: Yeah. If anybody wants to sign up for Deepgram, we give $200 in free credits as well. So, you'll be able to transcribe about 750 hours of of audio for for free, essentially.
Speaker 0: Yeah. It's really, really cool. Really nice way to get started. And indeed, that is what we will be doing today. Okay.
I think that means we are ready to kind of, jump in and get started. And the very first thing we are going to do here is we are gonna set up a directors project running locally now. I will give you a very quick summary of what Directus is in case you're coming from the Deepgram world and you've not heard of Directus before. So Directus is a really cool back end that you can use as a developer to build your applications. You connect it to a database.
We provide developer tooling and this really beautiful web application which you can use to interact with that data. And it's, suitable for handing to non developers as well, which is not very typical of back end, back end tooling. So we're gonna spin this up, and then we are specifically going to use Directus Automate, which is part of this application in order to build this kind of multistep flow, something that looks a bit like this, except each one will take on one of the steps we described in our project. This project will use some, extensions that we built and published to the marketplace, which is available in all directors projects. We can go and do that together.
And then that very final step where we create a new audio file and save it back to our directors project, we're gonna build that extension together because it doesn't currently exist. So that's that's the kind of rundown of how this is going to shake out. So with that in mind, I have this empty directory here on my local machine. It's just this empty directory called live. Let's move into it here, and we're gonna spin up a director's project.
The first thing we're gonna do is create a docker compose dot yml file. And I do happen to have one here. This is the docker compose file for spinning up directives locally with a SQLite database. There isn't too much to talk about here. We will use the latest version of directives that has been published on Docker Hub.
We have 3 volumes. So these are, directories that exist inside of the Docker container that we are going to map to local directories. And you'll see exactly what these do in about a minute. We need some environment variables, a key and a secret. You should replace both with random values.
For the sake of this workshop, I think replace with random value is random enough, so we'll leave that be. The initial admin email and password, which, of course, you can go change. The database client and being SQLite is just a file, So we're just telling it where that file will live. We have WebSockets enabled so you could do, like, real time subscriptions. It's part of my kind of default snippet that I have.
We're not gonna use that today. And then we're also turning on extensions auto reload, which is gonna be really important for the developer experience of building our extension at the very end of this work shop. So with all of that done, you can just run docker compose up. No. Oh, did I hit save?
I did not. There we go. And so it's now gonna go ahead and, spin that up. And you'll notice immediately an upload and extensions and a database folder. So they are the 3 volumes that are inside of the Docker container, but also mapped to a local volume.
It did a whole bunch of, like, first time, you know, seeding, and then we have directives running right now on local host 8055 with my admin, email, and password that we set in the Docker Compose. That's it. That's how I was having set up directives. This is the full fat version of directives running here. It's the same version we host in Directus Cloud, and we can with that jump straight in.
Damian, I might just give you a quick tour of it if that makes sense. We have a database. Yeah. We have a database. It's that SQL like database.
In here, we can create tables in that database and we can query them. We can interact with the data. Great. We also have users that we can create. We have a whole auth service.
So immediately, we have this admin user that you can invite other users. Users in turn can have various permissions, which grant them access to do different actions on collection. So create, read, update, delete, and share. We also connect to your asset storage, or you can save files locally as well. So this will connect to an s 3 bucket and, an Azure storage, Backblaze, and various others.
We are gonna use this later to actually save the summary back, from Deepgram. By default, if you don't say anything, it will be just local file storage, and it'll actually just get dumped here in this in this uploads folder right here in the sidebar. We have a little insights dashboard builder. We used that in yesterday's workshop. And then over here in settings, we have access to flows, which is the automation builder, which is what we're gonna use today.
I think the only other thing we wanna do before we kick off is let me just have a quick think here. The only other thing we wanna do is our public role. So this is this represents, all of the requests that are made that have no permissions that have not authenticated. And I'm just gonna give it the ability to read and write files. In the real, you shouldn't do this.
But for the sake of this, it'll be fine. What's the worst that could happen? So this will allow us to read and write files without needing to authenticate with with directors. What else is needed? We need we need the extensions from the marketplace.
So there are 3 extensions we need here. If I type in Deepgram, I built a few. I don't I don't like the spinning wheel. There there it is, Deepgram. So we have the AI transcription operation, and we have the AI text intelligence operation.
We believe in making things nice and small and modular, so we have separated them, and each one's very simple. In reality actually, question. I think in reality, you could do the the intelligence at the same time as the transcription. Right?
Speaker 1: Yeah. Yeah. You can send a single request, and you'll basically just enable those parameters, and you'll get both back.
Speaker 0: Great. As these are don't know what's going on there. Although, I've had issues with my Internet all day, so I'm gonna go out. I'd rather this was a bit slow than you not being able to hear or see what's going on. So we did the AI transcription, and then we had the AI text intelligence.
So we'll just install both of those. So these were released, last month as part of our directors AI bundle of, of operations for our automation builder. And then there's one more that I created just to make our life a bit easier today, and it's this extension here, RSS to JSON. It will allow you to go off and get an RSS feed, and it will return it will, pass it and turn it into a JSON object. And this will be really helpful because we obviously need to pass the RSS feed of a podcast.
So we'll go ahead and install that too. There we go. We'll just give it a quick refresh as it is prompting us to do, and we're ready to rock on. So we're gonna create a new flow. Podcast summarizer summarizer summarizer.
Sure. I don't think summarizer is a word, so I don't know why I am. So hooking up on it. And we can trigger this automation in 1 of 5 ways. We can do an event hook.
So an event hook can be triggered whenever or will be triggered whenever something happens in your database. So it could be, a new item is created in the posts collection or a new user is registered or a new file is uploaded. We have, webhooks, which takes an inbound HTTP request, so you can receive data from third party services. In the world of Deepgram, how we actually use it here on DIRECTOR's TV, our on demand shows all have transcripts. Some of our shows are very long, so we use Deepgram's, asynchronous callback mode.
So it goes up and does work and then pings you once it's done. And so that would be a webhook trigger. You can run them based on time, you know, schedules. You can have flows trigger other flows. So if you have complex, you know, use cases, you can kind of bounce portions off into their own modularized automations and then return the data back up.
And finally, manual. And this will add a button, this will add a button to the side of the data studio when you're in collections or item pages, and you can go and trigger it from there. We're gonna use a webhook because I just want the ability to call it really quickly and just making a quick call request is probably gonna be the easiest way to do it. I don't care about any of this because it really is just a quick trigger. So if I hit this URL in fact, let's do that.
I open a new, let me just build these 2 terminals. If I open a new terminal and just call this URL and refresh here, we'll see it's been triggered once in the logs. So I think that's gonna be the quickest way of just constantly running it as we go to to test it. Okay. Any questions so far?
Anyone in the chat? I I raised through this. I got us to this point super, super quick. We We scheduled an hour and a half in for this, and I think it won't take long at all. So unless questions are asked.
So, feel feel free. Not that you have to, although you need questions. A question for you though, Damien. With Deepgram's callback mode, can you give some use cases for when that's useful? Because it's a really good, you know, demonstration, I think, of the fact that you can do you can trigger flows based on webhooks.
Speaker 1: Yeah. A lot of customers use it, because it allows their server to, you know, get back to doing other tasks. Right? So rather than waiting for the response, the more features you enable, the, you know, the longer the request will take. So, you know, adding summarization and topic detection, entity detection, you know, it it can go up into, you know, the 30, 42nd range, and as the audio gets longer as well.
But yeah. Like, by default, if you've if you're just transcribing, you know, you can transcribe an hour long podcast in probably, you know, 10 seconds. Right? So, one of the other cool features is you can pass a URL to, like, an s three Booker. So you can tell us, hey, you know, when you're transcribing it, instead of me sending you the file, go pull it from an s 3 Booker.
And you can even tell us to put it back into an s 3 bucket as well, which is pretty cool.
Speaker 0: Yeah. We have actually, over in our docs, I've written a post before a deep gram post right here. Right. But that makes sense. It's to stop you having, like, hanging long connections open.
Right. And that that makes total sense. So this, what this does, is it listens for any file upload. It verifies that it's an audio file, and then it will send the URL of your of your file directly to to Deepgram authenticated with your token. It has a transcript returned, and then you can save that straight back to the file.
So it's placed right next to the file, which is really cool. It's a really straightforward automation here. And this also featured on, let me find it. This also featured on our quick connect series right here. So it's that same project but over in video form.
So if you're interested in kinda learning more about what's quite a common automation, I think, with Deepgram, you can see how to set that up. Okay. First thing we need to do then is we need to go ahead and get a podcast, like, actually go get, an RSS feed. I have loads of podcasts. I actually agonized over which to pick.
So I picked Darknet Diaries. You heard of Darknet Diaries?
Speaker 1: No. Haven't heard of it.
Speaker 0: Fantastic podcast all about cybersecurity. Really, really, really good. I just listened to just listened to this latest episode here, Anom, like, 2 days ago, came out June 4th. It was it was so good. It was not what I expected.
But a 146 episodes of Darknet Diaries and any I'm gonna say true podcast because I think Spotify has started to screw with the definition of a podcast is just an RSS feed, and they all follow exactly the same format. If it's not if it doesn't have an open RSS feed, it isn't actually a podcast. It's an appropriation of the term podcast. But the podcast is this kind of XML document, this RSS feed, and they all have, you know, some metadata that, you know, will be shown in your podcast acts. And then they have a number of items here.
So this item here that I'm highlighting is a single episode. It's that one we just saw, Anum. And you'll notice here in the enclosure, there is this attribute called URL, and that contains a direct m p 3 link. And that's how podcasts will work. And that's really handy because with, Deepgram, you can send a a binary file or you can send a URL.
And podcasts have this URL just hanging out there. So our job is get the URL. I can take this whole feed URL and use our brand new, I built it yesterday, RSS to JSON, RSS to JSON operation here, and I'm gonna call it feed. The fact I call it feed will become clear in their own. Why does this key matter when it has a name?
Why does this key matter? We'll talk about that in just a moment. We'll stick the URL in there, Save it. Hit it again. And I think we configured this flow to actually return the data from the last step.
So we are expecting to basically see it here. Yeah. There it is. The whole RSS feed, but turned to JSON. If we refresh here, we can also see it in our logs.
There it is. So there it is. That's pretty cool. There's our item. Where is it?
Here we are. There's our item array, and there is the MP 3. Now it does actually say in the docs of this, extension that I built yes yesterday. If ever there's an attribute, you'll note that you may remember it was an enclosure. I can show you.
It was an enclosure tag with an attribute of URL. And somehow I had to map that to a JSON object. So the chosen method was to make it an object and the attributes are just underscored. I think that's valid. So now we wanna dig in and actually get that data.
We wanna get that URL. So we will create a new we will create a new, step here. And this one, I will call, latest, I guess. Latest because we just wanna get the latest episode. This has all the episodes.
And we're gonna just run some JavaScript in here. Now the this, kind of, boilerplate here, is it is the zoom level okay?
Speaker 1: Yeah. It looks okay.
Speaker 0: Yeah. Cool. Have this data property. And data is a big object and properties in that object include the keys of all of the steps. So I can get the I can go and get the feed step by, you know, going data dot feed, and that's that whole object that was returned.
So if you name the keys, you can more easily pick specific values from all the way up what we call the data chain, and every operation adds a new object to the data chain. So we have data dot feed here. Now I happen to know because I didn't wanna I didn't wanna sync too much time here. I know where the value of the URL is. It's in dot RSS dot channel dot items dot item That's an array and we want the first item.
That's the value of the episode. Suppose we'll just store that. And now that that episode had a ton of data, how long is it? When was it published? What's the description?
What's the title? What's the cover art? The m p 3, obviously, and a whole bunch of additional metadata. It was huge. It was a really, really big object, actually.
The ID, the pub date, the link to the, like, web website, the description formatted, the URL, and data about the URL. Some data specifically for iTunes, the author, iTunes summary so much so much. But actually not so much. That's the end of it. I reached the end, but significant.
We don't need all of it. We only need some of it. So we're gonna just stop pulling out some values. So what we'll do is we'll grab the date That feels like a viable thing to to store. We'll turn that into a JavaScript date.
What was it called? Pub date. Pub date. And I know that we want it in an just in a an ISO string. So that kind of standardizes it.
So I don't think it comes in an ISO string. No. It comes in whatever this archaic thing is. That's the date. We want the title that also fills the digits at episode dot title.
We could grab the description. There are a few variants of this description. Taking a look. Let's take a look what's the difference. This one has HTML tags, p and 2 break tags.
This one does not. So this is the one we want here. The Itunes summary. Itunes colon summary, which means we have to use this syntax to dig in there. And finally, the actual URL, of course, episode dot enclo enclosure_url because it was it was a an attribute.
Okay. Looks legit. Save that. Let's run it again. Nothing.
Great. That's not what we want. What happened here? To ISO string is not a function. Oh, because it said to ISO sting.
That's a typo. Ring. There it is. The date, the description, the title, the URL. Cool.
Yeah. It's a pretty nice little automation builder here. Now we have the URL. I mean, strictly speaking, we didn't need that step. Right?
We could just crack on, but I like just reducing down that complex data structure into something quite known. So that we called this latest. We'll need that in this next step, which is actually gonna be the AI transcription operation that, that we built and released. So there are some options here. The first thing we need is a Deepgram API key, which you can get from your Deepgram dashboard.
We'll do that together in a moment. You need a full file URL, which we have. It's the it's the m p 3. You can provide a callback URL optionally and then sort of flip over into callback mode, which again stops long hanging, you know, connections, but this will be fine for this. We allow you to enable diarization, which do you know why it's called diarization?
This isn't leaving. I don't know the answer.
Speaker 1: Yeah. It could be called speaker identification as well, but, yeah, I think the research term first is a diarization. So it basically tells you who's speaking when you have a mono channel, and multiple speakers. If you have multichannel audio, you you don't really need to diarize, because you know each speaker's on a different channel. But, yeah, a lot a lot of people have a single channel, especially with a podcast.
It's it's not multichannel.
Speaker 0: Yes. And, thank you, Ramsey. I'm glad I caught it really quickly, but, yes, there was a missing r in ISO string. So you can optionally enable diarization, and then you can also add keywords. Talk to us about keywords.
Someone who works for a word that sounds like directors, I'm very, very intimately familiar with this.
Speaker 1: Yeah. So so keywords allows you to kind of increase the probability that we would, you know, pick up the rectus and direct us. Right? You know, as a single word versus, like, direct us. Right?
So if you put in that keyword with the spelling and then you increase the intensifier, and the intensity is actually it's a exponential scale. So as you go up higher, it gets extremely strong. Yeah, value of 1 or 2 is is pretty normal. If you were to put in a value of a 1,000, nearly every word will start turning into direct us. But that kinda gives you an idea of how you can leverage that feature.
Speaker 0: Yeah. Interestingly, it's not direct to us. It's always directors. Like, I am the director of the film. That's always like if when it's wrong, that's how it gets it wrong.
We don't need to use keywords for this. So first thing we'll need is a Deepgram API key. Here is our director. Here is our Deepgram console. Signed up for an account.
And you can go make a new API key. You can give it a nice name here so we can call this leap week work shop, workshop. You can optionally set an expiration. I will do that. I will expire this after 1 hour.
Right? Because I don't we won't be going for more than an hour, and then this key will just stop working. You can also, add some tags, but this is the thing that's interesting. You can change the permissions of the key, which is nice. Yeah.
Do you have any notes about this or just yeah. You can do that.
Speaker 1: Yeah. Like, if if you have certain needs, right, sometimes you might wanna generate keys, like, more API keys with an API key.
Speaker 0: Build an admin. Like, if you're if you're creating this as a service, for example, you're using Deepgram in, like, yeah. Cool. That makes sense. You get an API key, which probably shouldn't share, but mine is in an hour and a half, and it has a fixed amount of credit and no credit card.
So the US, we'll pop the Deepgram API key in there. Next thing we want is the file URL. You can add dynamic values using mustache impacts, double squigglies on each side. The last step was called latest and the value was URL. So that will resolve to the full URL that was inside of that enclosure.
And I think we'll leave everything else. I think that's that's the shortest version. I'll call this transcription transcription. Sure. Hit save.
No. Let's try it out. So now it's taking a little bit longer because it's not just making one HTTP request. We are waiting for for it to happen. Now by default, I happen to know because I built this extension.
We do turn on a couple of features. So I'll wait for this to finish and then we'll talk about those features. Maybe taking a hot minute there. Has it? Oh, oh, there we are.
There we are. Boom. Look at that. Huge. Right.
Before we look at the data structure that comes back, I will tell you that we are using smart format and we are using the Nova 2 model. So maybe let's talk briefly about each. Should we start with the model?
Speaker 1: Mhmm. Yeah. So so the base model is our oldest model. So that was, you know, from kinda 2020 18, 2019 era. It's an extremely performant model, but the accuracy is is a lot lower.
Some customers still opt to use it because it is just so compute efficient. And then we have our enhanced model, which, you know, added a bit more compute to it. But, yeah, our nova 2 model is the most accurate model that we have, and it's, yeah, available now in 36 languages, and we're we're adding more languages every month.
Speaker 0: Nice. And that is what we're using here in this, operation. And then what smart format do? I think smart format basically checks a bunch of other boxes for us.
Speaker 1: Yeah. So smart formatting, is actually baked into the model. So the model itself when it's transcribing is is generating the formatting. If you turn that off and you enable, like, punctuation and and numerals and things like that, that will apply post processing formatting, and which tends to lose a little bit of the, the context. Because, you know, some like, the number one isn't always meant to be a number.
Right? Like, if if I you know, I am the the one and only. You don't want the digit to come in there. Right? So that that's essentially what that's there for.
Speaker 0: Fascinating. So we applied smart formats. So we make that we make that available. So you don't have the option to turn those off or change them. That's just what you get with this, with this extension.
Okay. Let's look at what came back there. Big old payload. Now, just because this is a slightly I've gone into the big data structure that Deepgram returns, which, Damian, you've probably spotted that immediately. This is the first alternative is always returned.
So I can just speed speed up our I can speed run us here. So the first thing is this transcript, which yeah? Like you said, it's nice. It's formatted. Interest I didn't know it was baked into the model and that it's not post processing and that's the difference.
I thought it was just a shortcut to checking a few other boxes, but it isn't. It actually does something different.
Speaker 1: Yeah. And some customers will want digits but not punctuation or punctuation and not digits. So having them split out as well allows them to pick and choose between the the features.
Speaker 0: Right. So we have reached the point where this is to Deepgram directs us. This is too big for me to just scroll through and and talk about. So what I'll do is I'll just look at the docs for this specific extension, and we can talk about about it. So this was the AI transcription operation.
This is the data structure that's returned if it was a really short transcript. So we have the transcript. We saw that. We didn't actually manage to scroll to the end of it. Can you talk to us about the other objects that are all the other, properties that are returned?
Speaker 1: Yeah. So the words array is gonna give you the start and end times of each of the words, also the confidence that we have for that word. Like, if you detect a very low confidence word, you know, some people will actually choose to omit it. Right? It could have just been, you know, picked up from a cough or something like that.
And, yeah, if it if it's down at, like, 5%, it's usually, probably gonna be wrong. Right? But for the most part, you'll see confidences in the high nineties. We also have the punctuated words, so, you know, you you'll get the word as it was, you know, printed out, without any punctuation or formatting.
Speaker 0: What about that?
Speaker 1: And then what what we apply to
Speaker 0: They're not the same words. Like, it's a typo. Oh, it's a typo in my readme. It's a typo in my readme. Ignore me.
I'll go and fix that another time.
Speaker 1: Yeah. You would have seen lowercase h I and no no full stop in that case. Yeah.
Speaker 0: And then there's also paragraphs, which is which is also interesting. Mhmm.
Speaker 1: Yeah. So we can we can split it up, by paragraphs. If you enable diarization, we'll also split it up by, you know, who who said what as they said. Cool. And you can do utterances as well.
So that will give you kind of logical semantic breaks in in speech as well. Yeah. And if you were to enable diarization in each word, you would get a speaker ID as well. So you would have, like, speaker 0, speaker 1, and, you know, whoever spoke the word.
Speaker 0: Yeah. Lovely. And we see here the transcript is there. It's formatted, but it adds these line breaks in. So, you know, you can kind of print that.
We get paragraphs. We get sentences. We get start ends for all of them. So It's really nice and flexible. Yeah.
I see sentences could work quite nicely for, putting captions on a screen, like a sentence at a time or something like that. Okay. So that's the data that comes back from that. I think for this, all we really care about is this top level transcript, But the rest of it does exist. Now just a reminder, you can do audio intelligence within that single request if you're using the Deepgram API or SDK, but we've chosen to split them into 2 distinct operations so you can just have what you need, and each one can be a little more simple rather than being a kitchen sink of options.
So let's crack on then. Let's go ahead and add the text intelligence right here. So I'll call this analyze, I think. Once again, Deepgram API key. I think I, I can't see it again.
Oh, wait. The it's in the it's in the last one. We'll do that. We'll grab it out of here. There it is.
There's the raw value, which again will expire by the time this is over. Right. AI text intelligence and lies. Deepgram API key and the text is going to be transcription was the name of that was the key of the last step dot transcript. Now this is a point to and it will be the last operation.
I hate this. As a as a educators, I like, you know, lead educator. I think this lets you foot gum. If you start rewiring your operations, this value is not always the same. That's why I'm personally a big fan of explicit naming of keys and explicit inclusion of keys.
But key sorry. Last always exists. Another one that exists is the trigger, which would give you data from that, from that very first step. So it's just a couple of conveniences there. But we will make this try and go ahead.
Speaker 1: Is there any way to see all the available, step values or objects or explore them?
Speaker 0: Yes. There's a number of ways that debugging flows is an area we know needs improvement. I'm gonna just save this. You can take a look over here. Right?
And you can look through the logs and you can go, right, well, this was for each step, but and it was called latest, but you don't have the key immediately available. You can simply just log them. We have a logging step, which will add an extra operation. You can also just return it in the last step and it will return here. Or rather, I think when you configure the trigger, let's take a look, you can get all data back and that will return the entire object.
So you have options, but no, there isn't a really nice way to do this right now.
Speaker 1: It might be a cool addition. I know when I when I use, like, email, syntax injection, there's, like, a little list that lets me pick from them.
Speaker 0: Yeah. Yeah. No. That that makes no sense. And, this was actually the topic of back to Directus TV we go.
Of one of our recent request reviews. What was it? It was the improvements to flows debugging. So we spoke about it for a whole hour, with our community around what they'd like to see based on an open feature request. So maybe that's something we'll see in the not so not too distant future.
Alright. Let's try this. Now we are expecting to wait a moment for this because it's going to transcribe the whole hour, then it's gonna run the text intelligence. So I'll kick it off, but I am expecting to to wait.
Speaker 1: And then are are flows always triggered from an API request, or is there a way
Speaker 0: to There was there was 5 different triggers. So now we're a little bit deeper. I'll do this again because I think you're building more context around this. So the first is an event hook. So you can say, hey.
Whenever an item is created in this specific collection or these collections, trigger the flow. So you can do event based hooks. You can either do it before the database transaction occurs so you can validate, manipulate the inbound data before it gets committed or perhaps stop it in its tracks, right, and fail out if something that isn't correct, or you can do it after the data has been committed. So that's the event hook. We have the web hook, which we are using for this just for speedy rerunning.
We can run it on a schedule, so you can provide the 6 point cron syntax here and run it up to every minute. You can trigger it based on another flow. So one of the, operations in the list was to run another flow. You can put data in, and it will return data out so you can modularize your your flows a little. And then finally, manual.
And I think the easiest way to look at manual is probably just a quick trip to the docs. The manual flow trigger, you pick a set of collections and it adds this button over here to the sidebar. So this sidebar, it requires you to check 1 or more you 1 or more items and hits and hit the button, or you can do it from within an individual record, an individual item. And it will send the IDs of those items into the flow as part of the trigger. That did take a little while there.
You can additionally add this confirmation dialogue and collect per invocation values. So this could be useful for things like sending an email. Right? So you type in a message, you hit go, you've maybe picked some users or send a text message with Twilio, press a button, and off it goes. So they're the 5 ways to trigger, to trigger your flow.
But we're just using the webhook so I can just run it just by hitting up and enter here. Mhmm. Let's, see what that big object look like. We have the summary, which is nice. It's a nice length for an hour's worth of footage.
Can you talk to us about the rest? That's the summary, but we've got more.
Speaker 1: Yeah. So the topics, it's it's got a a lot of predetermined topics that the model's capable of picking up. You also have the option to pass custom topics. So if you have a topic that's kind of nuanced, very unusual, you can add that in as well. And that's gonna figure out, like, okay.
Whereabouts in, you know, this transcript, right, based on, the text in in this case. And whereabouts was it talking about WikiLeaks or fake off or scammer or spyware? And and that's really useful because now you have the ability to actually jump to that position. Right? So you could imagine if you wanted to find, you know, the area that was talking about, WikiLeaks, you could just click a button, and it would jump you to that segment in in the actual transcript.
Speaker 0: Yeah. Exactly. You could build out a search, You're not searching just the raw transcripts. You're you're searching for topics because that's more realistic to people's usage. That's really cool.
This again is quite long. So let's find our way to the to the example I I have written up here. So, yes, we get the topics on a per segment basis. You get the intents. Let's talk about intents.
And can we talk a bit about how intents are different to topics? Because I'm a little fuzzy on it.
Speaker 1: Yeah. So so topics can can be all sorts of things. You're probably gonna have, you know, say, 10 x topics versus intents. An intent is really like if if I'm making a phone call and I want to cancel my plan, you know, or update my address, right, that might be one thing. But I may go off on a tangent and start talking about my holiday in Spain and do you know what I mean?
And that that could be a topic, but the intent of the call really was to, you know, achieve something. And the same can be said about, you know, a video, a podcast. And so, yeah, I'm I'm interested about the intents that actually brought back for that podcast.
Speaker 0: Let's take a look. Might have to yeah. I might have to do the mother of all scrolls here.
Speaker 1: You could try a control f me.
Speaker 0: Yeah. I could. Yeah. You think I'd know that using a computer every day. What do we call it?
Intent. Thank you for that. Wow. Okay. Yes.
So Explore Samsung Smart TV features. That's funny. Though they were talking about because I just listened to this the other day. Basically, Samsung TVs have this feature built in where you could put it in, like, low low power mode where, like, it looks off, but it's not. And so the yeah.
If you push mute 182 and then power the TV appears to be off, but it isn't. And then if you basically run spyware on it and then put it in that mode, no one knows. So instead of needing to plant bugs, you could actually just use the Samsung Smart TV, which will record to the TV, and then you just go by and retrieve it later.
Speaker 1: And you can see how useful this intent is. Right? Like, straight away, it got us to something, you know, very interesting. Right?
Speaker 0: Yeah.
Speaker 1: Yeah. So there there's definitely gonna be intense. Yeah.
Speaker 0: Yeah. That's where discuss Anum's features. That is what happened. Interesting. Okay.
It might make a little less sense here, but in a call center context in particular Yeah. So with sentiment, and and it's pretty cool. I don't know if you're
Speaker 1: Yeah. So with sentiment, and and it's pretty cool. I don't I don't know if you have the playground up. There's a good visualization of the sentiment in there.
Speaker 0: Yes.
Speaker 1: Yeah. So if you scroll up and then you see just at the very top Per month. Okay. Next to summary. Yeah.
Sentiment. So so you can track the sentiment over time. Right? Because we're giving you, like, you know, sentiment, you know, at each sentence or utterance. And if you scroll down, you can see as the sentiment changes, you know, it goes to see negative negative negative.
So that kinda gives you an idea of, you know, what's happening throughout, you know, the show or or the phone call.
Speaker 0: Got it. Got it. But you might only need to know the average. So I think if memory serves me right, there is also I think it is literally called average. Mhmm.
Yeah. An average sentiment as well.
Speaker 1: Yeah. And and the average is gonna tend towards neutral. Right? Because, you know, the vast majority of of text is is kind of neutral. Right?
It's only it's only gonna be parts of the call go negative. So, like, if if you see there's I don't know if you can search for positive, see how many results you get.
Speaker 0: And so
Speaker 1: I'll say 10 and then negative.
Speaker 0: 69.
Speaker 1: Yeah. And then neutral. 81. Okay. So so so it looks like it it was kind of 5050 on the on the neutral and negative.
Yeah. Just enough to bring it kind of back to that new Yeah.
Speaker 0: Yeah. Yeah. Well, it's just gonna average the sentiment scores, which are between minus 1 and 1, I'm guessing, given that this is minus 0.0.15. Mhmm. Okay.
So we now have a summary, and now it's time to go ahead. And, and that summary was held in the output of that. I think it's called summary dot text. Summary was an object. And now it's time to use, the text to speech, APIs.
And to do that, we are going to build an extension, which I'm really excited about. Now for those watching, this isn't intended to be a play along, so I'm gonna go a little bit faster than I would running a hands on workshop because this is gonna be available tomorrow on director's TV. I'm also gonna turn this into a blog post sometime in the next couple of weeks so you can follow step by step in written form if that's more your thing. So, we're gonna go into our extensions folder here and MPX create direct us extent extend. This always gives me extension Extension.
Direct. Yeah. Sure. Let wait for the latest version, please. And here are all of the extension types.
You can create custom panels for direct us insights, the dashboard builder, custom interfaces, which are form inputs for the editor, but we are going to create an operation for flows. And I will call this Deepgram TTS, text to speech. I'll just write it in JavaScript and auto install dependencies. And given the speed of other things that have happened on my system during this session, I think we'll just be waiting a hot minute for that. But what we're gonna do here is we're gonna set up this operation, and we are going to use Deepgram's JavaScript SDK, which makes interacting with Aura, the text to speech service, Just a lot just a lot easier.
So while that's scaffolding, let's, oh, it did it. It did it. So I next time next time we wait, we'll take a look at Aura. So we're gonna jump we're gonna jump in here. Right.
Let's take a look inside of our new Deepgram TTS extension at the code. There are 2 files that matter. The first is app dotjs, and this describes all of the configuration. So this, says what is shown on the card here on this kind of overview and what options are presented here and then fed into the into the back end. So the API key and the text and stuff like that.
The, App. Js, yeah, will also do things like what icon is shown here, what text and description, stuff like that. And then there is the API JS which runs server side and actually executes executes the, you know, the the operation, then it will be here where we install and use the SDK. So let's
Speaker 1: that you can that you can build the UI through that code and do all back end process. Lot of other ideas kinda come into mind now that I see it.
Speaker 0: Yeah. And the on a lot with breakfast in multiple ways. You know? Yesterday, we built out actually, not for those watching on demand. Sorry for kind of crossing streams.
So you probably, you you may have already seen this. If we take a look at Directus and just take a very quick look at yesterday's workshop, another thing you may not consider is I'm just mute that. You may not have considered is, you know, we have this dashboard builder, and you'll be thinking, oh, okay. You know, it's all out displaying displaying insights. You know?
That could be useful or whatever. But what? Look at that, look at that quality there. I'll click back over here in a minute. Maybe just here.
Here. But this panel, you pick a user from a drop down and hit call, and it would use the Twilio voice SDK to actually do a two way phone call from your browser to the target to the user's phone number. So, yeah, really, really flexible. You can very much build a lot in it. Anyway, right.
So we're gonna create a custom operation. So first thing we're gonna do here is we are going to change the ID. The ID has to be unique across all operations in your project. So it's quite typical that, you know, people will prefix the name of their extension with their author name. I'm just gonna call this one Deepgram TTS because I doubt there will be another one called that.
And that has to be the same in both files. So also here, the ID. We'll call this 1 Deepgram TTS. What are we gonna do for icon? We will use record voice over.
I think it's what the one that I've used in the past for, Deepgram. And then for description, generate text to speech. Well, we don't need to save too much time insights. Just a a little visual thing. Now what we need to do is we're gonna pop in some text.
So actually, I think we'll just leave that as it is, but we are also going to pick the model. So let's actually take this moment to pause. Could you talk to us about the models in Aura?
Speaker 1: Yeah. In the playground that you you should be able to access, and we we literally just added it the other day. So on the top at the right hand side the very top right hand.
Speaker 0: Oh, text to speech. I didn't see that then. Oh, perfect. Yes. Yeah.
Speaker 1: So you can type in any text you want here, and it will generate it. Yeah. And you can just hit play on one of the voices. Angus at the very top, actually, is my voice. So, yeah, if you ever wanna
Speaker 0: Actually, I listened to it yesterday. That's so funny. So we have 2 here. You know what? Let's just for sake of argument, we'll just pick the top 2, Angus and Arcus.
But they each have this model name, Aura Angus e n, and AURA ARCUS en. So we're going to provide a way to do a drop down and just pick between them. And in theory, you would populate as many as you wish, or you would take away the choice and just pick 1 and not provide this option. But we can do that. So we have this text box in the option.
Let's, go and create a new one. So we will do field. This is what you name it. So I'll name it model. Right?
That's like the key that we saw. We're gonna give it a visual name. So we'll capitalize that. That is ultimately just going to store a string. And then we get to provide some information in here.
First thing we'll do is just the the width will make it full, which just means I'll go under it. So you can make them half, but whatever. But the more important thing here is the interface or the form input where you can create custom interfaces as we spoke about earlier. And the one we want is called select drop down like so. This interface has some options.
As you would probably expect, it's the choices. And each choice has a text, and that has a value. And, like I said, we will do 2. So the text for the first one is Angus, and we can see here, Aura Angus e n. Is that what it is?
Yeah. Aura Angus e n, and Arcus was the second one. Aura, Arcus, e n. Nice. Now the only other thing we'll do here is we'll just show it on that card.
This is optional. This is just this is just, you know, UI further to a to a degree, but we will and sorry. It's over here in the in the overview. So we'll also bring in the model, and we will show that on the card as well. You'll see what that does in just a moment.
Speaker 1: And what would be the default if that wasn't populated? Or is it just always It would just
Speaker 0: be an empty card. It would just be an Oh, it would be an empty card. Just like this.
Speaker 1: If the if the model wasn't selected.
Speaker 0: Good question. I think it might default to the first. Did I? You could probably set a default or handle the default over on the server side. If not selected, pick this.
I think kinda similar approach to most drop downs that you could build. So let's let's run this. Let's go npm run dev. And that's going to build our extension, watch for changes, and rebuild it whenever there's a change. Over in our first terminal here, we see here extensions reloaded.
If I hit save, it will rebuild the extension. Directors will see that I've rebuilt the that I've that the that the extension has been rebuilt. Will it?
Speaker 1: You might need to make a change rather than
Speaker 0: Yeah. That's what I was that's what I was waiting on. Interesting. I might just quickly restart it and see if it needed just a one time restart. And if that continues, then whatever will might just have to kick it up the bum.
So that's rerunning now. So I'll just save that. There you go. Extensions reloaded. Okay.
Just needed one quick one quick kick up the bum. So let's, let's see what happens now. So we will add to this new extension on the end. There it is. Deepgram TTS.
There's the icon we pick. That's the text. That's the title. We pass in the text, which we know is annual annualize dot summary Mhmm. Dot x, I think.
So analyze dot summary.text. Sure. And we pick the model, and there they are. So we'll pick your voice. That's quite funny.
I didn't know. I didn't know that. That's pretty cool. And we'll hit save. So and you see there the model is shown on the front, and that's the text input that we put in.
So we hit save. When this, operation runs, the API side will run. So the first thing we'll do is we will go ahead and, so go ahead and just pull the model in as well. So that'll just be the the e n, you know, Angus or aura Angus e n. Let's let's do this.
I'm really excited, actually. Right. We are gonna use the Deepgram SDK. So npm install at deepgram/sdk. Good.
Good. Good. And we'll go ahead and, initialize this. It's funny when I was a developer advocate at Deepgram, I did this all the time. So import, create, client from not that, from Deepgram SDK, and then you create an instance, Deepgram equals create client, and the API key have to go in here.
Yeah. Obviously, I Eric, why did I even bother hitting save? We need to pass in the API key here. We We don't really wanna hard code it in our extension. So instead, what we're gonna do is we're gonna add it here to our Docker compose file, which will bring it into the environment variables.
So mostly because I've already forgotten it, let's grab that key again. Let's pop it in here. We'll call it deep oh, Deepgram API key. There it is. We do need to restart our Docker container whenever we update the Docker Compose file because it just reads that once it load.
And straightaway in here, process dotm.deepgramapi key. One moment. These are fine. These are not errors. These are little warnings, not a problem.
It's just, some of some of the way that the, yeah, some of the the build of the Deepgram SDK, but it's not a bother at all. They are just warnings. And now it's time to actually build the hands, build out the the handler. So, what happens? We press the button, it goes in.
Now what we're gonna want to do here is ultimately we want to save a file to our director's project. And we expose a bunch of services to your project, which you can use to interact directly with these kind of directors primitives. Now, the first thing we wanna do here is we wanna go ahead and just add in here a second variable called context. And inside of here, con, const. There we go.
Const. We want services and we also want get schema, which we'll need to, initialize the service. Services is a list of all the services. The user service, the item service, the permission service, the role service. We only care about the files service.
So we'll just pull that out just just to make it easier. And then we'll go ahead and we will initialize a files service. A new file service. And in here, you have to pass in the schema of your project, and that thankfully is just returned in get schema. And I did just catch in the little tool tip there that that needs to be, awaited, and therefore, this needs to be async.
Not there. That's an object. So that's us creating the file service. That means we now have an interface with which we can create a file in just a moment. Next, we're gonna go ahead and use the Deepgram SDK to generate a a stream of audio, and this was lovely by the way.
I was speaking to to Luke, one of the DX engineers at at Deepgram recently about this. And the fact that this SDK uses the native interface makes this next bit really, really nice. So what we wanna do here is create I'll call it response for now, I guess. Or e g response maybe. Deepgram response.
What we can do here is just use the initialized client here with our environment variable dot speak dot request. First first argument here is the source, so we can just pass in the text. So you would yeah. This is how you do it, but a shorthand because the name of the of the, property and the value is the same, you can just shorthand it there. And secondly, any options that we want to use.
I might just create options as its own just to keep it really clear. I might just do this above here and then feed it in. So what do we wanna do here? I think all we really wanna do is we wanna pick the model. The same thing, we can use the shorthand.
So that'll be, Angus. And we're going to tell it what file format we want it to return. If memory serves me right, you can return quite a lot of audio formats from Deepgram. Right?
Speaker 1: Yeah. We we support quite a few different formats. Yeah. If you wanna play it back, typically m p 3 or WAV, if you wanna stream it, like, to Twilio and things like that, you'd probably do, like, raw audio linear 16 or Yeah.
Speaker 0: That's cool. We'll do this because in just a moment, we are gonna need to know the file format. So I want to explicitly ask it for a files format. So we know with confidence it's gonna be an MP 3. Then finally oh, hang on.
Let's pass this in. DG options. There we go. And finally is the file stream. I might call it dgstream, just again to be very explicit.
Now gonna await response dot get stream Stream. That's it. That is a DG response. That is now just a live stream of audio, which is fantastic. Because it is a file stream, we can pass it straight into the file service to upload it.
Now before we continue, we are also going to need a file name. Right? We we need to tell direct us when we create a file, what we want the file name to be at least as when you download it. And I think what we might wanna do is actually collect that from the user upfront. Like, hey.
What filename do you want this to be? So let's go through the motions here of adding a new item here and just a new text box. Right? And we'll call this one file name, I guess. Field name type.
I think that's all we need. File name. I And you
Speaker 1: need to change the name. Right?
Speaker 0: Nope. Oh, yeah. Yeah. Yeah. Sure.
But I mean, that's that's that will stop us getting confused, but strictly speaking, you you don't need to. So we have the file name. Great. We're not gonna bother showing it on the overview, so it can just stay here. And then over on the API side, we just wanna pull that pull that in here.
File name. Yeah. Cool. And just to make this consistent, I might just move this, and we can call this, like, request or something like that. Just, again, kind of handle these the same way.
So now we have the file name that we'll pass in. Let's save this. Let's, just refresh this. Make sure all of that works. And let the white screen stay for just a moment longer than I would have liked it to.
Look, there's a problem here, which tells me something went wrong. But I don't know what. Look. They've all gone a bit they've all gone a bit funny, which means I have caused a problem. Love that for me.
Might just zoom out one step for the sake of scanning a larger surface of code file name. I could have broken at any of those points. We haven't refreshed this in a while. Thought there. And these are, let let me scan this because they were just, these are just warnings there.
Oh, no. No. No. That's still the warning. Interesting.
Interesting. Let's look over here again. We passed in the the create client. Let's just save ourselves just a moment of effort and just figure out if it's in here, first of all. It was.
Okay. That at least helped somewhat. Let's. Was it maybe the fact that you I've called it request? It's request meant no.
That's fine. Okay. I mean, this is a top tier debugging. This. Okay.
It's something down here. Okay. I mean, okay. WAVs. Yeah.
So there's nothing wrong. Yeah. Let me just make sure the extension reloads. Fine. No bloody problem.
Nothing happened here. Right. Let's open this one up, and now there's a file name. Right? So I suppose we can put in a dynamic value here and call it and, and call it latest, which was the latest episode dot I think we called it title.
I'll save it, and then I'll just look back. Latest. Yeah. We called it title. That'll be the title.
That's a dynamic value, which is nice. So that gets passed in as a file name. We get the stream, and, all we need to do now is upload it to directives. And the way we do that is let's do the same thing again. We'll call it, directives options.
Just so again be really explicit. We need 3 things, I think, are mandatory here or it won't work. The first thing is the file name downloads. Whenever you download the file from director status, the file name that it will have, We've already established, so that's gonna be file name. Oh, dotmp 3, I suppose.
I'm not sure if that's needed, but I'm gonna do it anyway. It needs a MIME type. We just call it type here. We already know that's an MP 3. I know the standard format for MIME types is this, audio slash.
And again, that's why we specifically requested a file format. It doesn't really matter which one it was and where is it going. This matters because you can connect more than one asset storage to Directus. By default, it will use local storage, which is just this up which is just this kind of link to this uploads folder, but you can connect it to an s 3 bucket, and Azure Blob storage and and many more. So we're just telling it, hey.
This is where we want storage. Should that
Speaker 1: should that meme type be MPEG rather than MP 3?
Speaker 0: No. I think it's this. I think it's I think if if I caught my mind back, I think that's right. Don't know. We'll find out in a couple of minutes.
So that'll be, that'll be our first point of debugging. And then the final thing we'll do, although it is optional, is we will actually set the title of the new of the new file. So this is the, like, visual labeled title where the other one is what happens when you download it. That isn't strictly needed, but we'll we'll do that. Then what we'll do is we'll call it directors file, I guess.
The new file is we will use our file service. We will upload 1, and the first value is going to be a stream. Unfortunately, Deepgram just returns a stream, so we can dump that in there. And the second one is the directors options. What is returned from upload 1 is the new primary key of the new file that's being created.
So we will just return the rectus file. That's the whole thing. That's the whole operation. Before I good before I get all excited and describe everything that's happened, let's make sure it works because it it might not. Right.
Nothing seems to have gone wrong here. We have a file name set, so let's trigger this one more time, and I am expecting to have to wait now because we are doing all of these steps back to back. I'm expecting to have to wait maybe 30 ish, 40 ish seconds. But we'll see. It's an hour of audio.
So while that's happening, let's recap what's happening. We're going and grabbing the RSS feed and converting it to JSON with this custom extension you can download for free in the marketplace. We just, you know, traverse that that big objects that that's returned to just get the latest item. We didn't end up doing anything with the date, actually. Now I think about it or the description.
We transcribe it using the AI transcription endpoint and then run text intelligence. Both of these are Deepgram to receive the summary. And then this custom built extension we just built does text to speech. You pass in some text, You pick a model from the drop down. We just pick the first two out the list, Arcus and Angus, and you give it the file name for the original file for the new file.
And as you see here, what was returned was a string, which means there wasn't an error, which means there is our 42 second summary. Let me listen to your voice. I don't think this will come through. It it's you. It's so you.
That's so funny. So do people do people at Deepgram just get proposition to, like, get their voices up?
Speaker 1: 6 of the 12 voices are Deepgrammers, and then the 6 are voice actors.
Speaker 0: It's really funny. But there's your little summary. There's your summary. That's sounds freaking cool. So let's talk about the code, and then we'll talk about what more you could do with this.
Right? So we grab the data that was actually, you know, provided by the operation. We grab the file service and the get schema function, and we initialize a new file service that allows us to interact directly with the director's files collection. These three lines is all that's needed, and strictly speaking, you know, this object could go in here. So two lines is all that's needed to request a text to speech, a text to speech operation, I suppose, from Deepgram.
That's all that's needed and it returns a stream of data. Then we, you know, configure all of our options for our new file we're about to create, and we upload the file by just providing the stream directly, providing the options, and that returns the primary key, which we return. That's the whole thing. Do you have any questions, in the chat? Who have been remarkably quiet, by the way.
I hope you've enjoyed. Damien, do you have any questions, thoughts?
Speaker 1: Yeah. I'm I'm just amazed at how easy that was. Like, less than 30 times code, and it's all hooked up and it works. And a lot a lot of that code is verbose code as well. Right?
Like it's Absolutely. Trying to be expanded. So, absolutely really easy. And, yeah, being able to pass that stream straight into the file
Speaker 0: was very useful. It was very, very useful indeed. So what more could you do with this? Well, the obvious kind of first step is that load more data is returned from Deepgram. So you can do more with that.
You know, you why we could save the description directly to the file if we wanted. We can provide you know, we can maybe tag it with topics. We could do whatever whatever we want here. It's completely up to us. You could also run further automations either as part of the same flow or a separate flow.
It's all good that, you know, a new podcast has been transcribed, but do we know? Maybe we send an alert. We send, you know, an email or a notification to the user, which if we take a look here, there is a send email operation right here. So you could tell them that there's a new summary and maybe directly link them to the director's files m p 3, Because everything in directors, if we take a look at this new file here, it has this ID, and you can just go to local host 8055/assets/that. And there is our m p 3.
So you could link them directly to it if you fancied. Another thing is that, you know, this was a slightly conceited example in that we have to manually run it. But you could run a crop. You could use a chrome here. You could use a chrome, grab the feed and say, hey.
Has there been anything new in the last 24 hours since I last ran? Okay. Now go and transcribe the latest episode. You know? So you could run this on a schedule and make it like a daily roll up of new shows, new episodes that you could listen to.
Speaker 1: One other idea is, like, obviously, this was audio to begin with, so we we kinda compressed it, and we create a summary that became audio. But maybe there is, like, you know, a cool blog that you follow, but you may not have the chance to read the blog, but you'd like to listen to it. Right? You know, maybe in the car. So you could take a blog, turn it into a, like, you know, an audiobook, very easily, or you could even, you know, summarize it.
Right? And and play it out. I had a pretty interesting idea of, like, a a real time radio station that's basically, you know, tailored to exactly what you like. Right? So you could have a, you know, maybe archive research papers being fed in, and then it's giving you kinda the updates in real time.
Speaker 0: Oh, pretend I didn't see that. I remember I'm not sure if it will still be live. Yeah. Here it is. This is a post I wrote.
The date's wrong because I I worked here at that point. But what this did, it used my JSON. It literally literally live transcribed a radio station. I could edit in BBC Radio 4. Mhmm.
And it would live transcribe it, which was super cool. Super, super cool.
Speaker 1: Yeah. So now you can even do the reverse.
Speaker 0: Yeah. Yeah. Yeah. That's pretty, pretty cool. There's so much scope for this, you know, based on more complex triggers, you know, more logic in the middle.
Like I said, you know, this could be a cron instead of a trigger. So many options. But I think that is just about our time, and we have 2 more minutes. So, yeah, thank you so much for indulging me and and and and getting involved in this and sharing your insights. I learned actually quite a lot, during that.
Especially the smart format being part of the model itself. Fascinating. Not not what I thought.
Speaker 1: Yeah. Thanks for having me. This this is super interesting, and I'm kind of amazed that you're able to build all this from scratch, you know, in in the length of time that we're chatting here. And, yeah, it really just shows what's possible with Directus. So I might be I might be building a few little, thing that flows with it myself.
Speaker 0: That's how we get you. That's how we get you. And you can run it locally. Right? And it's the full fat thing.
You know, it's not like a a less good version. Like, it is the full thing. It's what we host. I will say that I need to add it to this, I think it's still in a PR, actually. I don't think we merged it in yet.
But the RSS to JSON operation, I will show it because it is also really, like, light. I just didn't wanna have to do it now because it kinda wasn't the point. We're taking the URL as you saw, and all we do here is that all the code for this operation is here. That's it. That's the whole thing, the whole operation.
We import a library called XML parser. We go off and get the RSS feed. And assuming everything was good, we just pass it, the attribute name prefix underscore, and then we returned we returned the past data. So that whole operation, that first one was the code. We could have built it live.
I just didn't think it was gonna be that interesting. Thank you so much folks in the chat, for your kind words. I'm glad we made your life easier. I like lots of claps. Yes.
There are lots of use cases, both for Directus and Deepgram and the 2 together, and I completely echo Jonathan's sentiment. Welcome to the director's community. We're very happy to have you. Great. And with that, we are at time.
So have a wonderful rest of your week, everyone. Have a wonderful rest of your, week, Damien. And tomorrow, just a reminder that there is one more event this week week and then there's this community networking social. It is using the one and only platform I have ever done networking on that doesn't absolutely suck. So if you're interested in meeting other people who are interested or use or involved in Directus in some way, shape, or form, drop by.
It's at if you go to leapweek.dev, it will be, localized to your time zone. But here in Berlin, in Central European, it is at 4 PM. So, yeah, hopefully, we'll see you at that tomorrow. Damien, anything else you wanna share just before we hit end?
Speaker 1: No. Thanks very much everybody for joining. And, yeah, really interesting possibilities. This is open all.
Speaker 0: Excellent. Right. With that, have a good rest of your day, nerds. Bye for now.
Speaker 1: Bye bye.