Esther speaks to community member Marcus about their Media AI Bundle - a group of extensions that allow extraction of key details using AI tools.
Speaker 0: But I'm also open to add more AI features, depending on use cases.
Speaker 1: Hi there, and welcome to another episode of the Beyond the Core Show. It's a director show where we shine the spotlight on extension developers in the community. My name is Esther Agbaji, and I work as a developer advocate at Directus. And today, I'm joined by a super community member, Ahud. Ahud is the community winner of the AI hackathon that held a couple of months back, and he won the hackathon by creating the media AI bundle extension.
And today, he's here with me to share all about the exciting journey to develop Hynis Extension and some of its features. So thank you, Arud, for joining me here. Could you share, like, some background about yourself and how you get to know about directors?
Speaker 0: Sure. I I work as a developer, at an agency. Okay. And I was looking for, alternatives to CMSs like WordPress, for example, but I wanted something more, API focused.
Speaker 1: Okay.
Speaker 0: So so we had so we could find something that works with larger projects, and more, like, apps and such. And after trying some of them out, I I wasn't really happy with how they worked. They weren't, like, easy to use for the users, at least in my opinion. But then in a Reddit post, I think it was, someone was mentioning, directors and, like, how it was. They they said it was fantastic, but the the it was kind of, under the radar, at that time.
So Okay. I checked it out, and I liked it too much. So, I started trying to, like, get it approved to to be used in our agency.
Speaker 1: Okay.
Speaker 0: And now we have successfully rolled out a few projects using it. And so far, it's been great.
Speaker 1: Interesting. Interesting. And what's your favorite director's feature?
Speaker 0: I think it's the interface. Okay. How extensible it is, and how how I can fine tune it for the editors.
Speaker 1: Okay. Okay. So you like the fact that, like, it's very intuitive and is also extensible. Yeah. Yeah.
I'm sure you probably like the extension feature because you've created not just 1 or 2 extensions at least that I know of. And, yeah, we are here to talk about the media AI bundle, which is a really cool extension. I saw the demo that you did during submission, and it was really good. So, what was the motivation for actually creating this extension?
Speaker 0: It started off as, idea for a personal project of mine. I wanted to be able to take pictures, of sticky notes.
Speaker 1: Interesting.
Speaker 0: Like photos with my phone and upload them to a Kanban board or similar. So that's what got me started. I was I had some APIs in mind that I wanted to use, but now I had a like, an excuse to get it working in directors. So I started working on the, an operations bundle. And then I realized that this could be used for much more.
For example, the the integration with alt text dot io so you can get alt text for images. So that kind of, formed it into becoming the media, bundle.
Speaker 1: Okay. Okay. Nice to hear. So you mentioned the old text. Is it like an AI API?
Speaker 0: Yeah.
Speaker 1: Okay.
Speaker 0: So it's like a service you sign up for, and they they transform the image into a readable text, like a sentence or
Speaker 1: Okay.
Speaker 0: Sentences for that describes the image.
Speaker 1: Okay. Okay. So, basically, the extension that you created, you upload an image, and it just gives you the old text or even if it's a screenshot of maybe, you know, text on a piece of paper, you can also read and extract that text also. Right?
Speaker 0: Yeah. And I have 2 two operations currently in the bundle. 1 is for the, one I call describe, image, which does the alt text thing. It takes the image and extracts the text or sentence. You can also use Amazon Web Services, but then you'll only get, like, a comma separated list of words.
So it's not as fancy. Okay. But it's an alternate alternative Alternative. Okay. Yeah.
And the other one is the extract text operation that actually reads, with, like, OCR, and tries to find text in an image. So you can extract.
Speaker 1: Okay. Extract the text from an image. Yeah. Alright. Yeah.
Would you like to share your screen? So just walk us through maybe some parts of the code and then a quick demo of the extension?
Speaker 0: So as I mentioned, it's a bundle. So you have a source directory with the different operations in this case. And the plan was to add more, as I get more, like, cases
Speaker 1: Mhmm.
Speaker 0: Use cases.
Speaker 1: Use cases. Yeah.
Speaker 0: So here we have the describe image, and the extract text from the image operation. So if you take the describe the image, I think that's the more more fun one.
Speaker 1: Okay.
Speaker 0: We have the front end part or the app part Mhmm. Which defines the operation. And here we have some settings that you can set up. You can choose if you want to use alt text dot io or Amazon Amazon
Speaker 1: Cognition. Okay.
Speaker 0: And you you will need API keys for these and Mhmm. You set them up in your environment variables. Okay. But I'm hoping to add some kind of, like, settings page or something, where you can do it a bit easier.
Speaker 1: Mhmm.
Speaker 0: And then on the, client side or or the server side
Speaker 1: Service side. Yeah.
Speaker 0: We have the API for this, or the director's API. So we get an image. You you can set up, like, a hook for or a flow with a hook for file upload.
Speaker 1: Mhmm.
Speaker 0: And then we'll check if we have, like, a image, if the file is an image, and then we will create a buffer from it so we can send the entire image from
Speaker 1: Mhmm.
Speaker 0: To this to the API.
Speaker 1: Okay. So you first verify if it's an image before you then send it.
Speaker 0: Yeah. So Okay. We don't waste any, like, credits or such. Okay. And then we get a result.
And, I try to, like, return a common format for it. So it shouldn't depend matter which API you use. You should have, like, a you should be able to expect what kind of
Speaker 1: Okay.
Speaker 0: Properties it will return, and they are documented in the readme.
Speaker 1: In the readme. Okay.
Speaker 0: So in this case, you get the description. But I also have, like, a dollar param row property that has the original payload. So if you want to get something specific from Amazon or from alt text, you can
Speaker 1: You can get
Speaker 0: that as well.
Speaker 1: Okay.
Speaker 0: Nice. Without having to exchange the the extension. And the same thing here for Amazon. The API used there is called detect labels.
Speaker 1: Okay.
Speaker 0: Yeah.
Speaker 1: Okay.
Speaker 0: Nice. That's how it works.
Speaker 1: That's That's the one for describing the image. Yeah? Yeah. Okay. Okay.
Let's check out the one for, like, extracting the text from image briefly.
Speaker 0: Yeah. It works pretty much the same, except it doesn't have it only has Amazon recognition for now.
Speaker 1: Okay.
Speaker 0: I am we I am planning to add ash Azure Vision AI.
Speaker 1: Azure. Okay.
Speaker 0: Because we use that a lot at my work.
Speaker 1: Okay.
Speaker 0: But for now, it's just Amazon.
Speaker 1: Alright.
Speaker 0: But the principle is the same. We we you you we receive an image. We take the stream, and send it to, an API. Mhmm. And the only difference here is, of course, which, API we're sending it to.
So in this case, it's the detect text command
Speaker 1: Command. Okay.
Speaker 0: On the Amazon SDK. And in this case, you get a bit more parameters back, property spec. So you can get either the I try to transform them into lines, with text and, like, where in the image it's located. But you also get a full text if you're just trying to, like, transcribe an image.
Speaker 1: Yeah.
Speaker 0: Okay. So if you want that quick and easy fix, you can use the full text form.
Speaker 1: Full text. Okay. Okay.
Speaker 0: But if you want to, like, get down with the greater details, you can use the lines.
Speaker 1: The lines. Yeah. That's very clear and nifty. Did you face any challenges or issues when you were building, you know, all of these bundles?
Speaker 0: I think the for for the most part, it's it's been pretty
Speaker 1: No. It's pretty straightforward. Okay.
Speaker 0: Yeah. Of course, the the I guess the one thing that was a bit tricky is, like, trying to find the right services. I know that that has been added to the documentation now.
Speaker 1: Yes. It has.
Speaker 0: When I wrote this, there wasn't really much.
Speaker 1: Wasn't. Yeah. True. I remember. Yeah.
True. But now we've included services in the docs.
Speaker 0: Yeah. So getting the, like, asset service and figuring out how that works, took a bit of time. It wasn't wasn't hard, but you had to, like, shake your house.
Speaker 1: Again, to the core and the doors.
Speaker 0: Yeah. But that's also a strength of directors, I think, yeah, that you can do that.
Speaker 1: Yeah. Can you imagine that? Okay. That's cool. Let's go into the bundle to see how it works in directors.
Speaker 0: Yes. So I have a flow here. Let me see in a bit.
Speaker 1: Yeah. Yeah. It's good now.
Speaker 0: Called file uploads.
Speaker 1: Okay.
Speaker 0: And here we have when I created this, I made a trigger with non blocking action with the scope files.
Speaker 1: Files that's uploaded.
Speaker 0: Mhmm. Currently, there's no filter for it. I think I'm gonna write a feature request for that. I think it would be useful to, like, be wait for the
Speaker 1: Before I wait for some time before I
Speaker 0: finish before fires up. Yeah. Because right now, you you when you upload it, you you get sent to the, like, start page, and you don't see the changes until they have performed. So you have to wait a bit.
Speaker 1: I see. Yeah.
Speaker 0: But for now, it works well with the non blocking one. Then we have the operation from my extension, describe image. And here we are select which API we want to use and you are able to change the field if you want to. But in most cases, it's the trigger dot key, which is the image you have uploaded. But I'm left it a bit configurable if you have more advanced flows.
Speaker 1: Yeah.
Speaker 0: Then when this is run, you get some data. So I have an update data operation here.
Speaker 1: Mhmm.
Speaker 0: Where I simply just update the, payload of the file that's created in Directus. So in this case, I put the description from the operation into the description field.
Speaker 1: Mhmm. Yeah.
Speaker 0: In some cases, I might put a transcript or similar in between just to clean up a bit, especially if I use the other, operation that can the the extract text one. So if you if you would need to, like, make sure that it's not too long or something like that.
Speaker 1: Okay. Do that. I just remembered, did you, maybe handle errors and all in the whole, you know, operations?
Speaker 0: Some errors, but in most cases, it will silently fail. Like, if you upload a image that isn't or a file that isn't an image. It will just silently It's a test. Failed.
Speaker 1: So you don't Okay.
Speaker 0: Yeah. So you don't get, like, a lot of error messages. But, I'm not sure if I I think I have some,
Speaker 1: yeah, some error message. Okay.
Speaker 0: Oh, actually, I see. You seem to throw even if the image is
Speaker 1: If it's not image, it threw as an error also. Yeah.
Speaker 0: Or maybe I added that. Yeah. Anyway Yeah. So you can do a failed state if you want.
Speaker 1: Path also. Okay. Okay. Yeah. I think that's fine.
Yeah. Would you like to maybe try with could we do a demo of just uploading a an image or a screenshot? Sure.
Speaker 0: Just gonna find the good image to upload. Okay. So here we have my file library. And Mhmm. I'm gonna drop an image here from my other screen.
Speaker 1: Okay.
Speaker 0: And we wait a little bit, and then we check if we get
Speaker 1: Nice. The tracking that we get. White cut. Yeah. Sitting on a blanket indicate very descriptive to the point.
Yeah. Yeah. That makes a lot of sense. Lovely.
Speaker 0: And, of course, the the big advantage of this would be, like, to have alternative texts in on websites if you have, someone with a disability so they can't see. Yes. Yep. Then they can get this re read up instead.
Speaker 1: I would just love to know, in terms of improvements to the extension, the things that, you know, features that you would like to add, what are some of the features or improvements going forward that you love to add to the extension?
Speaker 0: Yeah. So one thing I want to add is, like a settings page.
Speaker 1: So The settings page?
Speaker 0: Yeah. So so you can ease more easily add the API keys, I'm thinking. I'm also thinking, like, that that will be needed for, for example, the marketplace.
Speaker 1: Yes.
Speaker 0: Perhaps or at least it will make it easier. But I'm also open to add more AI features, depending on use cases. I had one, for example, with, trying to detect objects in an image. Okay. So that might be something we I'm gonna add later on.
And as I mentioned, support for other services like Azure, perhaps Google, if if, anyone needs that.
Speaker 1: Okay.
Speaker 0: So so if anyone has any suggestions, I recommend posting it in on GitHub.
Speaker 1: Feel free to. Yeah. Either comment or make an issue or just even comment in our Discord channel as well. Thank you. Alright.
Yeah. I'm really excited to see when the marketplace launches. Hopefully, we're able to have this extension in the marketplace because I know lots of people would also find it equally useful and important in their projects as well. Alright. So the very final question I have is, I know you've also built the Jira panel.
The last hackathon you worked on Jira panels. So what's will I say advice or you know? Yeah, advice would you have or give to anyone that is building extensions? Because you have bit of more experience in that aspect. So anyone building a custom director's extension.
Speaker 0: I think my advice would be to not be afraid of looking into the, direct to source code, because there's a lot of it's built very modular. So a lot of the existing interfaces and operations and all of that are available. So you can see how they are built. So even if the even if you can't find anything in if you're looking for something in the documentation and not finding anything that you might find how, under interface or operation, have sold.