Rate This Episode

Filtering of JSON Objects

Name: Filtering of JSON Objects
Uploaded: 2024-01-18
Description: In this recording of our live event on January 11 2024, Rijk, Jonathan, and Daniel discuss filtering inside of stored JSON objects.

Request ReviewSeason 1Episode 1January 18 2024

In this recording of our live event on January 11 2024, Rijk, Jonathan, and Daniel discuss filtering inside of stored JSON objects.

Transcripts are automatically generated with AI and may contain errors.

Speaker 0: Welcome, everyone. Happy 2024. We're excited for a new year. We're gonna try out a new well, we've been we've been playing around with this request for views format for a little while, but we're gonna we're gonna try out just covering a specific feature, talking through some details, making sure that we have a full specification on it, and getting additional community feedback, as needed. Today's topic will be JSON object filtering.

It's been a it's a very, very popular request that we see quite frequently, and we would love to ensure that we solve this problem correctly and get it working for the databases that support it. I think one of the key issues that we've run into we've done a lot of development work on this, but what we run into is various database vendors support this differently. You fix it to support that thing. It breaks something else. We spent a lot of time kind of iterating back and forth on this feature, and I'll let Rai talk about some of that.

But I mean, fundamentally, we we know that this is important. We consider it kind of a critical road map item, and that's the reason we're gonna talk in detail about it today.

Speaker 1: Absolutely. Well, thank you for that. Yeah. And as you can see, I was the one who opened this feature request conveniently with no details whatsoever. I basically just say, hey, man.

I wanna be able to filter in JSON objects. Good luck. And then everybody was like, yep. Me too. However, you know, the this this was done way before we had that RFC format, so the the the details are lacking.

Let's let's call it that. So before we dive in too deeply, you know, Daniel, I'm just gonna throw you under the bus here. You wanna walk us through what what even is JSON filtering in the first place?

Speaker 2: JSON filtering can be quite useful if you store JSON inside of your database, and you need to check some field inside of that JSON in your database. Like, we said in the beginning, right, not every database supports this as of right now, which forces us or will force us to, you know, do some little Directus magic as usual. But more and more databases kinda jump on the train, which is pretty neat. I think as what was it? SQLite very recently also announced that they have JSON b support, I think.

I hope I'm not miss remembering. But, yeah, can be quite useful. If you have stored JSON, then you can filter on fields inside of that JSON. That's very useful.

Speaker 0: And I think as as Kevin points out, actually, a couple of our default interface configuration our field configurations actually store in JSON by default. You can change them in most cases, but and, currently, that's the recommendation. So off the support side of things currently, for those of you listening, if you use the CSV format instead, on the database field setting side, you can then filter through the application in the API on those fields. But JSON is a little bit nicer structured format, so the reason for the request.

Speaker 1: Yeah. And then the other the other big sort of elephant in the room there, of course, is that a lot of systems are utilizing and or could be utilizing more of a document style structure rather than, you know, a tabular style data structure. Especially, you know, when you have flexible schemas or unknown, you know, data structures ahead of time or semi semi structured data. Think about, you know, blocks on a page or something. You know, you're talking about sort of rich data that may or may not be structured like a table.

So, therefore, you know, storing JSON in a Postgres database or another SQL database can give you some of that document magic without having to switch over completely to a document database. So there's a lot of a lot of benefits to having this. Now that being said, it also comes with a ton of complexities. So in my original feature request I was really thinking about it sort of as a filter against the data. Right?

So similar to how we have, some functions to run against date values. I don't know if you've seen those before, but we have things like extract the year from a timestamp as a function. I was thinking about it the same way in my original feature request. Right? So I was thinking maybe we can have just like you would be able to do, like, year time stamps equals 2024, you could do something like, I don't know, Jason, name of the field, and then some sort of identifier string to select something from that field and then run filters against it, against that value that you've now selected.

Right? Similarly you'd be able to use that in fields or in sort or other pieces like that And we actually put a lot of work in that already, and by we, I mean Tim who's in the chat, and I'm definitely gonna put him on stage and put him in the spotlight. Jonathan, if you wanna pull that up real quick, it might actually be fun to take a quick scroll through. Also, as a way to answer that, do all DBs even support it? Because there's a wonderful table at the top of that pool request.

If you wanna pull up the, don't bring him up. We're recording. He's he's a little camera shy, but we can we can figure it out.

Speaker 0: You know the PR off the top of your head, or do I just need to go find it?

Speaker 1: Oh, just just open up pull requests, and if you search for Jason, it should be the only one that's there.

Speaker 2: GitHub

Speaker 0: pull requests.

Speaker 1: We've left it open intentionally. We'll we'll circle back to that later how we're handling now. But if you just probably all the way at the bottom is is a little older. Yep. There it is.

Speaker 2: Alright.

Speaker 0: There's your matrix.

Speaker 1: Right. So in this PR, we basically took a swing at implementing that, you know, JSON filtering the way it was described in that discussion. And by described, I mean, vaguely hinted at, because the description was a little poor. But we did implement it in that way, and it does actually work. So we do have, you know, queries for a little different databases.

But as you can see in this table to the Chet's point earlier, not everything is supported everywhere, which becomes tricky immediately. Right? Because how do you then document it? Now it's gonna be database different. So there has been, you know, implemented some fallback support so you can see the difference in that green check versus the, that sort of Unicode check I guess.

You you can see that for a lot of these things we're actually you know having to do some direct us magic to to make that work against the database which is not necessarily gonna be the most performant or the what's the right word? The the the quote, unquote right way to do it. Right? It's a blue check mark. Oh, if you're on Windows, just pretend.

If you're on Linux or Linux, just Linux, just pretend. The check marks are green. Anyways, while building this, if you might wanna pull up the file section of this PR, Jonathan, I think it's a fun fun scroll through. This if we go all the way to the top, there's a tab. Files changed.

Speaker 2: 2000 editions. Oh my god.

Speaker 1: Yeah. There's there's only a 1,000 lines across 50 files. The long the long story short is that in the database helpers, if you see in that left hand sidebar, you know, we have to add all of the additional queries for JSON filtering ourselves for a little different database types. Sort of hard coded in, You see them here in the dialects if you just click one of those. It doesn't really matter which one.

So all of those sort of you can see them here. You know, we have a JSON extract for MariaDB that may or may not exist across the other database vendors. Right? So it's not a SQL standard, which makes this a heck of a lot more complicated than you'd think. It's because every database does it differently.

I believe it was Postgres that doesn't rely on functions even. It has like a special syntax with like, builders and arrows and whatnot. Which is pretty interesting. I don't

Speaker 2: know if the

Speaker 1: voice first 12 one has that in there. But anyways, there's there's, like oh, maybe this that's the one with the question marks there. I don't know. There's there's all sorts of different syntaxes is what I'm trying to say. Oh, here they are drawing with the arrow the the dash arrow arrow.

That kind of stuff. So what we realized with this sort of initial work that we did on the JSON filtering is that, well, 8, super complicated as you can tell. There's a lot of additional work and a lot of additional logic, which means in turn you're gonna maintain. As always, because the more stuff you add, the harder it becomes to maintain. Right?

But, the third thing is we're also starting to we also start to wonder like, okay. Instead of, having it as a function style thing in the query parameter, what if we do it more like the fields parameter itself? Right? Where you can just say, like in GraphQL, for example, you could just provide a nested tree that you wanna select instead of having to do it through a filter attribute. And at the same time, we've been working on a new data abstraction engine in the first place that already sort of, like it doesn't necessarily fix the fact that we have to do a lot of stuff ourselves for every database because it's just database specific, but it does it in a way that is designed to have database specific drivers rather than dialect specific overwrites.

Right? So this is gonna be a bit of a more of a deep dive, but the way the API is set up right now is that everything effectively goes through connects, right? The SQL, what do you call it? SQL query builder that basically everybody and their mom is using. And what that means is that first it just becomes a sort of generic SQL, and then at the very end it's translated for the individual vendors.

Right? So it's effectively this is a very crude explanation, but it's effectively just doing a find and replace for the quotes, making sure it's, like, the right quote for the right database. And then for some of the databases there's a little bit of additional magic, like for SQLite, you know, for an ALTER TABLE statement there's some magic included there. But the long story short is to add stuff like JSON filtering, it becomes tricky because now it's database specific. Right?

So we don't really have a way and it was the same with the timestamp, helpers that we did earlier. We don't really have a way to make that agnostic across all of the different database vendors. Right? The second part there is that we know that we wanna support more database vendors over time, not less. So trying to do it in this sort of like make it generic first, add 1 by 1 overwrites to the dialects.

It doesn't really scale anymore. Right? But end of the versions. A very good point, Tim. Because I think if we go back into the table, we already saw it.

There's, like, differences between MySQL 5 and MySQL 8 plus, although 5 is now end of life. So that's a whole different discussion. But, we're the same for, you know, Postgres 10 versus 13 and up. So the way we're sort of re architecting that piece is by saying, you know, there is a singular, there's still a singular data entry point, but rather than relying on SQL, it relies on an abstract syntax tree, you know, Veron Design. Just a proprietary data format that explains to the engine what the data looks like that we wanna fetch.

Right? And then for each of the different vendors, we're gonna have a driver that interprets that command and then just executes it in whatever way is appropriate for that driver. Right? So for a lot of the SQL drivers, we can still share a lot of that SQL magic like we're doing now. But it also means that we can start opening up the doors to other, you know, other data sources and other, JSON specific data sources.

Right? So one of the reasons there, to keep going on that train of thought, is also because once we started getting super deep into JSON filtering, we also started to realize, like, well, if we implement the JSON filtering like field selection rather than, filtering specific functions, you end up with effectively just drivers for a document data store. Right? Which is very interesting. So with that in mind, we could also start thinking about what does it look like to use something like a DynamoDB or MongoDB or some other sort of key value slash unstructured document style store with the sort of direct to the API.

Importantly, not treating them as a relational database, but treating them as a document data store, like leaning into the flexibility of a document database rather than trying to force them into a relational structure, right, like we've seen seen in the past and have explicitly avoided. That was a very long train of thought. But what I'm trying to say is, what we're getting at with that is that we've effectively not shelved, but we've sort of put the pause on this particular PR. And instead, focusing now on implementing this sort of JSON selection support directly into this driver based approach. Right?

Because the one thing we didn't wanna do is add a lot of complexity add more complexity now only to then replace it, you know, in a couple of months with another breaking chain with completely different structure. But we've left open the PR because we definitely don't wanna lose any of work in any of the code because it is being, you know, repurposed repurposed into the new structure. In the chat, there would all be called features, not bugs. That's very true. Any bug in SQL is a feature.

Speaker 0: Got a question about the specs for the drivers. I believe we do plan to fully spec and document those the driver interfaces, right, for the new architecture?

Speaker 1: Absolutely. Absolutely. Yes. Yep. I mean, as of right now, it's still very much, you know, in in r and d, so we're not opening the doors quite yet on on building your own, but it is built basically as an extension in mind.

Right? So we wanna make sure that those things are just do whatever you want as long as you adhere to the spec, you can you can, you know, save and read data from wherever the f you want. But, you know, we're definitely focusing on just feature parity with what we have first, and then expanding the scope there with JSON filtering and some other, you know, additional relationship types. And then, you know,

Speaker 0: Yeah. Yeah. I love the I love the idea that we're moving towards a an extension driven driver approach. Right? The same way we've done with so many other components of the platform, making it so that it's extensible.

And it also means that you've got a custom data source. You've got an API. You've got other things. You'd have a spec and be able to build your own driver against some custom data source that you have, and be able to leverage the API and power of the Directus application on top of that. So very exciting very exciting 2024.

Speaker 2: Absolutely. Wait for the first person, implementing the, Excel sheet data store. I think this will be a a smash hit in Germany. All of the all of the companies love Excel so much. Oh, Maybe maybe The whole finance world rejoices.

Oh.

Speaker 1: We should start that as a little competition. Whoever builds the Excel data store first gets a shout out on the website.

Speaker 2: Oh, you bet. Oh, you bet.

Speaker 1: A signed certificate of insanity by by me. Love that.

Speaker 2: The the worst thing is the the very worst thing is that people will actually, like, honestly use it, I'm afraid.

Speaker 1: It's it's not the using it part that that worries me. It's the relying on part that me. Anyhoo, circling back to, more of the the specifics of JSON filtering as a whole. Right? So just to circle circle back to to a requirements list because we we don't have, you know an RFC for JSON filtering proper we don't really have an RFC for what it could look like in new formats.

Right? We do know that we want to support it from field selection perspective with that sort of nonstructured data store in mind. We do know that we have to support it as part of filters and sorting and querying and and all of that goods. That being said there is a very interesting difference between JSON objects which is sort of the assumed default that we've been talking about here, and arrays, which is where it gets real complicated real quick. Because one of the main questions or use cases where this is coming from is for fields like tags, right, where you just have a JSON list of individual strings for tags and then how do you search through those.

Right? But now you're not so much talking about you need to make a nested selection of an adjacent path and then filter against that. Now you have to now search through each item of the the array. Right? So we do have some specific magic going on for one to many's right now like some and none for example to say I want all of the values in my related table to match x y z But we should probably add something along those lines for Jason when it comes to filtering specifically.

Speaker 2: I think.

Speaker 1: This is mostly again, for for those who've who've joined us in the past on these live sessions, for Rich, welcome back, the the goal is oftentimes, you know, to really, really diverge to find what are the boundaries of what we can or eventually maybe want to do with this, you know, and then converge back into what is realistic and what does that first MVP look like. This is really that diversion thinking stage. Right? It's like, how how far do we need to go when it comes to filtering on stored values? For example, do you need to be able to say things like, I want all of the JSON objects into an array to have a nest of property author dot agent.

They all need to be bigger than than 12. Right?

Speaker 2: I've actually never used, like, field type currying like in Postgres or in the others. So I'm curious. Is there a function of the database that tells you whether or not the stored value is an array or not? Like, is there, like, some database layer check that tells you that? Is that possible?

Speaker 1: I I think the the realistic answer is we can't assume that there is because we're talking about, you know,

Speaker 2: good plan. Good plan.

Speaker 1: We're we're talking about various different database vendors, and we're talking about various different SQL like vendors that, you know, are are sort of SQL inspired, but not fully compliant. Think of, you know, the the I wanna say plan and scale, but they recently added foreign keys. So I think they might not be more compliant than they used to be. But those types of vendors, right, where it's like they they use the SQL syntax for basic querying, but they don't have a full SQL engine behind it because they just implemented the data store differently. So the answer is we don't know nor can know.

Because we need to build this and design the the specs for this in a way that is sort of agnostic to a data store. Right? Tim was mentioning, you know, we have 3 different pieces of functionality extracting the data in fields, extracting the data from a field, and then using it to filter. And then with the deep filtering against stuff within that JSON blob, filtering inside the j oh, yeah. Yeah.

Filtering inside of an area with a deep. Yeah. Yeah. Yeah.

Speaker 2: So

Speaker 1: So I think, ideally and this is a bit of a a different way of thinking in how we treat the API because it used to be very, you know, tabular data forward first is I think what we're leaning towards now is really treating JSON values f it as if it's just any other table effectively. Right? So we just treat an object as if it's any other item in the database. We treat an array as if it's any other table in database. And therefore, just allow you to use any of the regular query parameters as if it's a table.

But you get some very interesting usage patterns at that point. Right? Could you consider using JSON schema to type and validate field? Absolutely. There's a very interesting thing, though, when it comes to to JSON and and typing slash validating, is that a lot of people sort of accidentally will start using a document database as a relational database, therefore, completely defeating the point of using a document database.

Right? And this has been the the main, if not the only argument, that we've had against supporting MongoDB in the past because it has come up before. You know, there have been a couple of feature requests every now and again. But it was always the question of, oh, let's just use MongoDB instead of Postgres. But that really makes no sense if you did pardon my French.

It makes no sense if you just if you just think about it real. Right? Because the a document data store has a lot of perks. There's, like, a lot of good things about it. But using it as a relational database is just not what it was designed to do best.

So you're kinda just forcing it to do something it wasn't made for. At which point, it's like just use a SQL database. Right? If you wanna have relational data in a tabular format, use the SQL database. That's what they excel at.

That's what they're best at. That being said, it's the overlap of the 2. You know? And so overlap of the 2 where it gets really interesting where you know, if you're talking about, things with rich content, like, you have views in an app or pages on the website or something, right, where you have fixed data points, fixed metadata points that you have for everything. You know, you have a title, you have an author, you have a published date, you have some of those pieces, like a status.

But then you also have rich content which is gonna be, you know, a semi, semi flexible schema. Right? It's when you mix the 2 where I think it gets really, really interesting. But that does also mean that you need to be able to manage and search through and extract the nonstructured part with a semi known schema. That's where it gets tricky.

Right? So, what made me think about that is the question, you know, consider using a JSON schema to validate the fields. If you're strictly validating against one schema of JSON objects, at that point, are you better off with using JSON or should you just make a couple of columns? Right? Because a many to one, field with a related table with structured columns will most likely perform better than trying to do it with nested objects.

If, you know, assuming that there's gonna be searching and filtering and and organizing involved. Another question from the chat here was, Postgres allows for date ranges, which is effectively a JSON array. Somebody built a custom date range interface for this but cannot filter against it. That's the question is that something that will be supported by this pr or would that be something else? Yep.

That's basically exactly it. You know, you have any sort of arbitrary JSON value you need to be able to manage and search through it. It is a table document for the record, though. Each record in a collection have significantly different content structure. Yes.

That is a very good point, which is why the JSON schema again becomes interesting. Right? Because if you have a column that you want to be the same schema for every single row in your database, in your table, then relying on, you know, adjacent schema for validation makes a lot of sense because you want every one of those to be the same. At that point, you know, what becomes the benefit of using JSON over just columns? Right?

That those those are sort of the questions that we need to, ask ourselves. Because the other one is like, if you have the because because the nice thing about the JSON thing is that anything could go in there. That's kind of the point. Right? It's like an unstructured data type.

So what if you want everything to be different? Or maybe you want it to be one of 5 known different schemas, right, instead of just the same fixed object. I guess with an adjacent schema, you have union types, if I'm not mistaken. But food for thought. It's an interesting one.

But it it does, you know, it does raise a different question that we need to write. What about validation? Right? Because, like, we have validation rules for, for regular columns now where you can say, you know, a number needs to be up and greater than something. For JSON fields specifically, we need to come up with something.

Right? Using a JSON schema could be very interesting. Maybe a JSON schema for the whole record instead of just the JSON thing field could be interesting. Maybe both. Maybe it's a nested.

Here's another another thing. Let's see what was the other chat related things on the same click. Somebody else, asked, I'd have missed this, but the idea is to integrate directives as existing filters at dynamic variables, etcetera, or would there be a new JSON specific functions? So that is a great question and kind of the the, instigator of rehashing this chat because we had this feature request. We sort of we're looking at it through the lens of let's make a specific JSON function like we have the others, and just implement that.

Right? And then Tim went ahead and did it. Kudos. A tremendous amount of effort. But we also really started to, then, you know, it's it's kinda what we talked about with the the data abstraction piece just now.

We started to realize that it might be way more powerful to try to treat it as an item rather than treat it as a value. So you unlock all of the other query parameters against the value. That answers. It gets it gets a little bit theoretical quick here.

Speaker 0: Soon as you use

Speaker 1: it relationally, it becomes even harder to maintain. Good point. Good point. Just started using as a MongoDB as a replacement for Redis for for a project that doesn't use Directis they are crypto so their relational is based on wallets. Makes sense You know, if you think about that use case, you have a semi structured data.

You know roughly what goes into a wallet, but there's a lot of optional, fields oftentimes. Depends on the implementation, of course. But one thing you do know is that you have a fixed primary key that you can use for some relationships. Big sport of Jason's schema. Everywhere.

Just for the spec Simpler, Tidebox for the win. I that used to be my go to and then a lot of others in in my team sort of using Zod, and I kind of have been converted to the

Speaker 2: The inference is pretty

Speaker 1: is not true here.

Speaker 0: Let's see what

Speaker 1: You kinda want to know that oh, it disappeared. Where it is? Oh, you kinda wanna know that a list of tags is always a list of strings or that an object matches a specific schema. Depends. I'd I'd say, generally, yes, but also really depends.

For example, in to give a system thing as an exam right? For interface options, we don't know ahead of time what's going in there. Right? It's up to the interface to figure out. It's just anything.

It's basically just a blank store for an interface to do some sort of options, but it really doesn't matter what the format is, what's in it, etcetera. Somebody shared p g adjacent schema, schema validation for postgres specifically. It's where it gets real tricky. Solution relied on the different filter syntax. JSON pass number to XPath.

Yeah. Yep. Exactly. I think the, the the JSON schema example is is a good example of the the added sort of complexity that we're putting ourselves in here on purpose, which is that we're really really aggressively want this to be a standard in the API that you can use no matter what the data source is. And then it just depends on the data source how performant it becomes.

Right? But I'd really, really would like to prevent ending up with, you know, a long table in the docs that says, can you use JSON filter? Yes. No. No.

No. Yes. Yes. Yes. No.

Yes. Yes. Yes. No. For for the different drivers, especially if those drivers are are third party maintained.

You know, there's it's gonna be incredibly annoying. It does mean that for some of the data source, we'll definitely have to, we'll definitely have to, what's the right word, add add fallback logic and do a lot of that magic on the director side of things rather than on the data store but so be it You know? I I think that is a a trade off worth having, as long as these drivers can handle it themselves. Because Yeah. Right.

Speaker 2: We're fine. Scenario. Go ahead. Sorry. The the delay.

No. It was, just for Brian's message, which was, the discussion point, my mind towards use cases like page builders, where I wouldn't need to create a ton of separate tables. That's true. And this is exactly the use case where I, in the past have used. You know, just sometimes if you very quickly want to throw together a component, but you don't really don't wanna create, like, different relationships and whatever.

So you just quickly make, like, an h ref with the label and slap it together as adjacent, for example, very quickly, you know. That's that's a very nice use case for that. Yeah. Very quickly, very easy.

Speaker 1: Mhmm. For sure. And I

Speaker 2: I agree.

Speaker 1: Another interesting thing that he's mentioning there is saying that it would be better served by a NoSQL database. Right? That's where the interesting discussion really starts. Because if you're talking about, you know, a page builder for a website, then the pages themselves, just like what is the route? What is the title?

What is some of the metadata? I'd argue that's probably better for a SQL database. Right? Because you're talking about structured data that you wanna query through on a column by column basis. You wanna say get all articles from, you know, January 2024, for example.

That is gonna be faster in, you know, a relational database because it has, you know, structured data with known types and you could search through them efficiently. But the content part of a page, absolutely, it makes a lot of sense to have that as a page builder. Right? Because you have semi structured data. You have an array with 1 or more or 0 or more, I guess, you know, blocks of a known type.

So that's where that JSON validation with schema comes in potentially. You know? And then that's the data is semi structured. So you have objects of, like, x different types, and then that has an array of those things. Yeah.

So that makes a ton of sense. But I think the real power is the combination of the 2. Because if you were to try to do all of those pages in MongoDB, now you have to trade off of the sort of filtering performance and sort of the searchability aspect of it. But on the SQL side right now, you have the penalty of not having that unstructuredness of the data. Right?

So right now, we would rely on any to any's for that type of use case, which has its own benefits around, you know, the searchability and the joinability. But, you know, if you're on pages where you don't really have to reuse, sections, for example, then, yeah, I agree. You're doing it as a JSON object for for a page builder makes a lot of sense. Right? But implementing it in a way where you get both is really the

Speaker 2: Oh, this this is why Brian, the father of, agent c OS, He he felt that pain probably, so he came up with this. Yeah. Yeah. But I agree wholeheartedly. Like, this came up exactly when I wanted to do then.

If you have, like, one off components that you don't really, you know, they don't have something to do with other stuff or you want to get some data in, it's very, very, very neat. The repeater interface is very useful for that.

Speaker 1: Now there's one additional topic that we haven't touched on that I think we can easily spend 15 minutes on. And Brian is hinting at it right now, actually. Relationships. Relationships is where it gets real tricky. Right?

Because right now, we know that's, article dot author is a foreign key to author's table. Right? So that that metadata is known. We know that ahead of time. So therefore, we can do things like give me everything, nest everything, nest everything, you know, start up start dot star whatever.

And it knows what to nest because it knows what the relationships are. Right? When you're in within a JSON document or an area of JSON documents, there's really no guarantee that something is an ID or there's no real knowledge of what that path is and where it points. Right? So once you want to start nesting, data like that, it becomes tricky, tricky, tricky.

Right? So within a a MongoDB context, if I'm not mistaken, it's like a MongoDB ID is always globally unique. Right? Right? So as soon as you have a nested data of that, I what what did they call it again?

There was, like, a Mongo ID special type. I forgot what it's what it's called. Object ID is I think what they call the type. Anyways, when it encounters a value of type object ID, it it can assume that it's a globally unique ID and therefore knows what the what document to fetch. But in a sort of hybrid model, you don't necessarily have that luxury.

Right? Because you can have a nested, categories array in your adjacent object that just says 123 with the assumption that you're talking about rows 123 in your categories table. Right? Because you're mixing concepts, because there's no such thing as globally unique IDs in a relational database.

Speaker 2: Kind of a design. I'm already imagining some type of custom format with, like, a dollar sign, direct to dot one or something, and then we have to translate it. And oh, no. It's a whole repertoire. We could do this.

Speaker 1: The same way as the chat, apparently. Somebody is saying direct us could prefix it.

Speaker 2: That's what I meant. Oh, yeah.

Speaker 1: We could. We could. But it does mean that we would have to make the data in your tables proprietary, which is something that I'm personally not a fan of. Right? It's like the system tables are one thing, because, of course, you need to have some metadata saved somewhere and we need, you know, a place to store that.

But I really, really badly, desperately do not want to introduce any sort of direct to specific proprietary nonsense into your own database, in the user tables rather. So that's a tricky one.

Speaker 2: Right? So If we would do that, then we would probably also have to provide a way to extract the data, and it's it's more and more patches on top of patches. I don't think we wanna go down that road.

Speaker 1: The question really becomes like, how do you pull together a join? Right? And it's the same the same reason we haven't really, because Brian said relational repeater. Right? The reason why our repeater interface currently is not relational is because of this exact reason.

It's like, how do we store the information of, you know, what field is relational to what? And the other tricky thing is, you know, without some real clever engineering from John and Nicklaus, most likely, There's no guaranteed way you can really do a join, based on the nested JSON value. Like some databases would, some databases don't. You know, it's the same sort of story. Right?

So it becomes a performance nightmare. But what are some of our options there? Right? Just thinking out loud. One would be that we have some sort of, kinda like direct directors relations.

Right? Where we're saying, well, if your JSON object has a path called, I don't know, article dot author, it we're assuming it's a foreign key, but that is always gonna be, you know, you can't really guarantee that. Right? An additional option would be to allow, you know, the end user to sort of pass the join information, right, manually through the API, where instead of saying author dot name where we know that author is a relationship, maybe there's some sort of syntax where you can just say author joins on table dot column, dot name. I don't know.

Right? So you can pass it yourself. Can we just say we don't allow relations in JSON? I've I've tried that in the past and I think the answer is no. No.

I I'm I'm afraid the answer is no for that. Because because that is that's kind of what, you know, that's the situation we're in now. Or, like, we have a relate we have a repeater, and people continuously ask, can we make it relational? And I continuously answer, how is that any different from a one to many? And then oftentimes, you know, we're just sort of in a sort of stalemate, which is, well, it's not JSON, but it does do what it needed to do.

So we're good enough. Right? But I think it's fair to say that one of the design requirements of the system would be that we need to have some sort of way where you can relate data from a key that is inside of a JSON document.

Speaker 2: Yeah. I assume we would have

Speaker 0: to keep

Speaker 2: track of it. We've

Speaker 0: talked about this for the editor JS. Right? And or WYSIWYG being able to have great relational data structure that's stored inside there. There's actually a relationship to another

Speaker 2: Right.

Speaker 0: This object. Right? Another direct to side of them. So we we've we are thinking about this. We have some ideas around this.

I think it's separate from this particular request. I think right now, this request is primarily around filtering and searching and finding

Speaker 1: It's true. Yeah. Yeah. Because because for the record, this really quickly turned into how to just properly do JSON indirect this rather than how to filter through an object. All all very much, you can see to that.

Speaker 0: Still important. It's it's part of the the divergence, the comeback to convergence. But, I just as as you guys have continued to talk about this, I think there there is a separate kind of feature function likely for relationals versus the ability to filter and just find stuff inside these objects and do it really and performantly.

Speaker 1: Yeah. The the WYSIWYG one is an interesting example. That's that's a good point because this everything that we've been discussing so far is basically gonna be the underlying engine slash foundational stuff for that type of use case. Right? Where in a in an editor JS type of environment, the line of text kinda like in like a notion docket, for for those unaware of editor JS.

Each line of text is its own, you know, block effectively, and you have blocks of different types. But then a block could be a relationship elsewhere. Right? Which is a great example of this. Like, how do you know what that relationship is?

In I'm pretty sure in in, you know, a Notion like environment or an Energy. Js type environment where you have, you know, a semi structured data, part of the solution there is that that joint information could be in the block, right, where you say you have an image block. And because you have an image block, you now know to look at images for looking up the related value. Right? But, again, that depends on the schema that is in your JSON field.

And if we were to make that a requirement where we're saying, well, there's a there's a a god. Way to go, Brian. If we if we're saying that there that has to be in the JSON field and now we're making the form of proprietary again. Right? Because now we're requiring your JSON object to adhere to a certain spec.

We designed just to be able to do relationships. Tricky. For a second, I was I thought you were gonna write down, this will be tricky. Rest rest assured. This will be very difficult.

Is there not a normal standard for that already? Yeah. You'd hope so, but the answer is no.

Speaker 0: We gotta You know, there are some other CMS vendors that do this, but, again, they've got more, I think, generally well defined kind of hearts hardened structures. The ability to be agnostic is where this becomes very difficult.

Speaker 1: Yeah. No. That's a great point because we

Speaker 0: to say, this is the data model you must use. Therefore, this is I I can therefore rely on the fact that my structures are gonna match. But I think, potentially, with some of the discussion we had on the schema piece, you know, as part of this entire discussion here, if you were able to say, I am predefining schema. I know what that is, and then I wanna have enforcement validation and understanding that that's there. I think you we could ag you know, the agnosticity is you define the schema or we define a schema, but it's generic in the sense of you're not required to have it.

You have a JSON field that's unstructured and doesn't have schema. But if you choose to implement that, then you are actually then getting the benefits of now I can I can kind of quote guarantee as much as you can guarantee in a JSON world? And I know what my data structure is, and I know how to how to interact with that event.

Speaker 2: Yeah. To be frank with this, it's just very, you know, very, very difficult. I mean, we we could, you know, like, just, willy nilly, we could keep track of, some keys in a new table that says, hey. This relates to that, like, similar to, like, relations, like we do it right now. And, well, try to keep that instinct.

It's well,

Speaker 1: So when you're talking about the under, other vendor use case, right, I think for the majority there, it's basically around the idea that you have a globally unique ID. Right? So it's very document database first approach. If you've ever created a so we we use Notion a lot. Right?

So if you ever create a Notion page that links to another Notion page, that is just a globally unique ID. So therefore, you can nest it however you want and it knows because everything is a page at all times and every page has a globally unique ID. So therefore, that sort of relationship lookup kind of solves itself. Right? Because you don't need to know what collection or what table to look at to find the item because it's all globally unique.

Right? The difficulty here lies in the that hybrid model where you want to nest a relational record inside of a document, which itself might be, you know, related from a relational record relational row. Oh, do we like to make things difficult? But that's why it's cool. Let's see.

There is a portable text spec. Absolutely. Which, again, I think it would be great to have a some sort of, you know, portable text interface or, an extension that handles that specific format. But I I it would be a shame if if this everything that we're talking about only works if you're in a specific format. Right?

Because who are we to say what the right format is?

Speaker 2: I agree. I just looked at the, audible text one. Yeah. It look looks very, very similar to how everybody else does it. You know?

You have some things. You define your type, and then you have some keys depending on that type. Great.

Speaker 1: Cool. Alright. We're quickly reaching the top of the hour here. It was it was a good idea to only focus on one topic for these sessions. Just based on prior experience, it always turns into this should be quick and then you you talk about it for an hour.

Just to give a chance to chat, any other thoughts, questions, concerns, ideas, shout outs. I wanna say hi to the family at home. This is the time.

Speaker 2: Or when they start typing.

Speaker 1: Everyone is typing. Config is code. Uh-oh. I'm pretty sure we have a separate one of these dedicated to that again coming up. Ship it.

Speaker 2: Ship it? I'm not gonna I'm not gonna say it right now.

Speaker 1: Let's go. So for 20 minutes for brings everyone here. Same. Yeah. The next one's gonna be Comic It's Code.

Speaker 2: Cool. Cool. Cool. Cool.

Speaker 1: Cool. So make sure to come back for that one, saucy, saucy. Cool. Well, if there's no other questions, thoughts, feelings, or concerns, let's wrap this one up. It's Brian just said the conversation about config as code is is in 2 weeks, not the release.

I think this is a very, very, very important

Speaker 0: Thank you, Brian.

Speaker 1: Thank you, Brian.

Speaker 2: It's Android. So it's it's imported immediately. Go now. What have we done?

Speaker 1: Cool. Well, all that being said, everybody. Thank you so much for joining us here live. Thank you so much for watching this at home. If you're seeing it on Directus TV, for the people live here, if you don't know what that is, check it out.

Directus dot a 0/ tv. I think I got that right. Right? It'll be up in about a week's time. But for now, I wanna say thank you all.

Good luck in Godspeed. And we'll see you in 2 weeks.

People

Rijk van Zanten
CTO of Directus
Jonathan Wagner
Sales Engineering Manager at Directus
Daniel Biegler
Engineer at Directus

Resources

GitHub Discussion