Rate This Episode

Integrating Firecrawl with Directus

Name: Integrating Firecrawl with Directus
Uploaded: 2024-10-16
Description: Integrating Firecrawl with Directus

Quick ConnectSeason 2Episode 1October 16 2024

Transcripts are automatically generated with AI and may contain errors.

Speaker 0: Hello there. I'm really excited about this tutorial. So on Directus TV, we already have a show called Quick Connect, which shows you how to integrate third party services with Directus using Directus Automate and Flows. And in the spirit of that show, today, I'm gonna show you how to integrate FireCrawl with Directus. Now here they say that they turn websites into LLM ready data.

And what that means in practice is you can feed it a URL, provide some options if you want, and it will go and take a look at that web page and return some structured data for you like so. This is their scrape endpoint, which will take a single web page and scrape some data from it. They also have a couple of other endpoints, crawl and map, but today, we're gonna use scrape. Now I've already logged into FireCrawl Cloud and generated an API key, which I'll copy for later. You can also self host FireCrawl, but for ease, I'm just gonna use their cloud product here.

Now I have this directors project over here with a new empty collection called companies. In this collection, there are a few fields. URL, a name, a description, mission, and a boolean, a true force value, is it open source. And our goal will be to provide the URL and then have FireCrawl automatically populate the rest with flows. So let's go ahead and create a new flow.

So this is our automation builder if you've not seen it before. I'll call this one get company data, I guess. And we are going to use a manual trigger, which will add a button to the side of collection and item pages. So we're going to say we'll run this on the company's collection. What else matters here?

We're going to, not require selection, so the button always works. And we're going to require confirmation, which will pop up a modal. And in that modal, we will just add a URL. We'll make it a string input, and we'll make it full widths. And I think that's all we need to do here.

So just to see what happens here, if I go back to the, to the company's collection, we now see this button here, this manual flow, trigger. I click that. It pulls up the box, and we'll put in a URL and hit run flow. So now if we go back to our flow, we should immediately see that there is one log. And in here, there is a body, and the URL is the value that we typed.

Fantastic. Now we need to actually do something with it. So, let's go ahead and add a new operation here. And, honestly, fire crawl is pretty sick. You can just make one web request.

Let's take a look at their docs. We're gonna use the LLM extract, endpoint here, and let's just take a look at the kind of construction of this API call. It's a post request to this URL. We're gonna pass in our, our API key here as an authorization header, and then they give us this kind of JSON payload here. Here, it's telling it to go ahead and extract specifically these four fields, the company mission, does it support SSO, is it open source, and is it in Y Combinator?

And it's saying you must go get all four of these. So let's actually just turn this straight into a flow request, request URL, operation. So we're gonna do a post request to this URL, post request to this URL. I'm gonna go and copy my API key again here, and at the end of this I'll, I'll destroy the key. Authorization authorization, bearer API key, save, And then there's the request body.

And, honestly, it contains a little more than we need, but this contains everything we need. So we'll just pop that in there. The only thing we wanna do, of course, is pass in the URL that we put in the box. So we'll replace this with trigger.body. URL.

Fantastic. Let's save that and see what happens if we go over to content, press the button, and type in directors.io. We see that's running. That's running. That's a good sign.

It means it's going off and making the request, waiting for the request. And then we see there is a second log. And we get some data back. There was a 200, so it was successful. Inside of data, there is a property called data, and then there is this value called extract.

Extract contains all of those custom keys we asked for, company mission, supports SSO, is open source, and is in YC. And then always when you scrape, you get this metadata object, title, description, language, Open Graph data, source URL, and so on. So, really, all we wanna do here now is we wanna take this data and create a new company from it. So let's add a new, let's create a new operation on the resolve path of that web request. Let's call it create data.

We're gonna create something in the company's collection. We'll give it full access, and then we just need to provide a payload. So let's go ahead and do that. We have an object here. We have a name.

Oh, we have a name, and we're gonna pass in the value of the last operation dot data.data.metadata dot title. Then I, for one, am I'm just gonna copy this and edit it each time. So we have name, URL. So that's last .data.data.metadata. Source URL.

We could, of course, just take it from the trigger body URL, but this is properly formatted. You'll notice I typed in directus dot I o, but when it came back in the payload, it came back with the with the, with the protocol HTTPS and so on. So we have name. We have URL. We have a description.

Now this one is also from the metadata description. We have the mission. Now this was a custom piece of data we asked to be extracted. Company mission is what we called it. And finally, we have open, open underscore source.

Last data, data extract and then is underscore open underscore source and then remove that trailing comma. So I believe that's the name of all of the fields. We'll figure it out in a moment when when it inevitably doesn't work. We'll hit the button again, directors.io, and hit run flow. Once again, that's going off to fire crawl using their endpoint and there we see there we see it straight here.

URL name, description, mission, and the boolean is open source. Let's, let's try that once more. Let's go in here and say firefirecrawl.devrun flow. So let's see. And in theory, we should just give that a moment, and there it is.

So now you can go ahead and grab more data. Now, of course, if we take a look at this endpoint here, you can provide custom properties, and it will try its best to get data out from that. They have a couple of other interesting things which I'll draw your attention to even if I don't think it works in this context. They have extracting without a schema. So this extract here was us creating a a schema.

Right? You can give it just a text. You can give it a prompt. Extract the company mission from the page. But the thing I don't like about that is you're not explicitly saying what the name of the key is, so you don't necessarily know what it's gonna be at the end.

I like creating a schema personally. They do something else that's kinda interesting. If I take a look at where is I think it's in their API API reference here inside of scrape. They have this interesting thing called actions. So you can get it to wait, to take a screenshot, to click, write text, press a key, and scroll.

And the combination of clicking and writing text means you can get it to interact with your web page. You see it here, actions, wait two milliseconds. You could get it to, like, sign into things perhaps, perform search terms. I think it's super interesting. And then take screenshots, of course, and upload those to directors if you fancy.

So there's a lot of flexibility in this. Having seen kinda how easy this API is, I think I'll go ahead and turn this into, an extension some point in the next few weeks, which we can release as part of Directus AI. But, yeah, that's how to integrate FireCrawl with Directus using Directus Automate. Hope you found this interesting, and by all means, if you have questions, just reach out.