Rate This Episode

Configuration as Code

Name: Configuration as Code
Uploaded: 2024-02-01
Description: In this recording of our live event on January 25 2024, Rijk, Jonathan, and Daniel discuss configuration as code.

Request ReviewSeason 1Episode 2February 1 2024

In this recording of our live event on January 25 2024, Rijk, Jonathan, and Daniel discuss configuration as code.

Transcripts are automatically generated with AI and may contain errors.

Speaker 0: Welcome everybody once more to a wonderful request review session here where we go over feature requests and figure it out. Now what do we do? I'm afraid we ramble on for about an hour about the technical complexities. Remember, the goal here is to basically divergently discuss, you know, what is the feature request, what are we trying to do, what is it trying to achieve, And how do we think we can make it happen in a very sort of direct to see way? What are we talking about this week?

Speaker 1: Yes. We're talking about configuration.

Speaker 0: Configuration as code. Let's let's figure out how to take schema endpoints to the max and actually Schema endpoints to the max. Manage the entire project as code. So this is really with a focus on GitOps. Right?

Where you have a sort of centralized repository of static files that is the single source of truth for all configuration of the running project. Which as you might guess, they get complicated fairly quick. Hello. And as per usual, we'll be eyeing the chat. So if you have any questions in between or any suggestions or any good thoughts, please do please do put it in the chat.

I already saw his name fly by. Well, most likely I have a very special guest for you today, because our very own Connor has been researching, you know, some of this for a little while now. But before we dive into the research results there, let's discuss a little bit of the requirements that are presented here in the current feature requests. Right? Because the one thing we know now, you know, the current state of affairs, we have that schema snapshot and apply endpoint, that we use and sort of recommend for, you know, moving bits of schema, to and from dev to prod, that sort of thing.

But as people have pointed out, you know, that is still for schema only. Right? So we know one of the big requirements for this is gonna be you need to figure out additional configuration, additional additional, data points maybe from your own tables, you know, environment migrations, like you mentioned there, which includes, you know, what about roles? What about flows? What about presets?

What about translation strings, etcetera? This well, one of the complexities for this is figuring out, you know, what is configuration within the context of directives in the first place. Right? Which is a discussion topic that I have had some trouble with just going through myself already, which is what is configuration? You know, are your roles and the way you configured permission configured, you know, permissions configuration, probably.

But the users within those roles, probably not. But then users with static tokens, maybe. Right? If you have your own tables, maybe you have a single, you know, app settings, singleton collection that you use for configuration is that now configuration that is part of code first configuration. Right?

Even though it's not a system table and you're not configuring directives, you might still be configuring other things. Although, that's where the fun starts. So maybe we could scroll down a little bit, Jonathan, just take a quick peek through, the other the motivation and the requirements here. So, you know, as we kinda touched on already, the same here from, Erif van Oort. Pretty sure that would be a Dutch user.

It's it's about things like permission logic, you know, keep the local dev environment in sync, source control is the source of truth. Right? You wanna make sure that you can spin up new Directus instances not completely empty, but start it from, you know, a template that is in your repo. If there's an issue, you can easily, you know, share, the configuration of your platform. Daniel, if you would kindly mute, you're being very annoying.

From the replying system, this is immediately where where it gets complicated. Right? It's like, what is configuration? Right? What is configuration?

When it comes to import export, how to define what gets imported, what gets exported? Basically, the same question to me. Right? How does it get imported? You know, are you merging stuff?

Are you overwriting stuff? What happens if you try to insert something that already exists? You know? How do you deal with conflicts? Very good question.

So if you wanna scroll down a little bit further to see what else is in here.

Speaker 1: Yeah. No. I don't know about that point. That's a good question, but a very, very long one to answer properly. But the gist is, you know, if you work with multiple people with different setups and if somebody changes your database schema, for example, how do you synchronize the state between your instance and another instance?

You can do that with our schema endpoint. You can we already have that, capability. But, technically, you would want to or ideally let's say ideally, you would want to set up your configuration as code, because then you have a single source of truth. If you're developing a new feature, for example, you need a new table, you need new fields, you want to test something, you wanna try something, but then you, you know, delete some fields, How do you get the changes synchronized between different setups? And even the problem gets even larger if you have an organization, for example, with, let's say, I don't know, one dev department of, like, 8 people, for example.

Stuff gets really gnarly really quick. How do you synchronize then between 8 people, for example, between different branches, different features, different collections, different fields? You know.

Speaker 0: Let alone a test team of 200. Right?

Speaker 1: Yeah. And this this is, you know, for a very small team, it can get quite gnarly pretty quickly. But, you know Yeah.

Speaker 0: There's a couple other things there too. Right? When it comes to the git repo flow specifically is that any change to the schema of the project is now sort of, like, version controlled, so you know what happened when and you can roll back. And you have accountability because you know who made the change, through that sort of git first approach. Right?

The other main thing there too, I think, is from a database template, you don't have files, which is one thing we'll touch on in a and the second thing is it's database vendor specific. That's another thing. Right? Like, you could plop the whole SQLite file in a repo, but, you know, if you wanna move if you have a local dev instance that uses SQLite and you wanna go push your change into production in Postgres, now you have a workflow trouble. Right?

Even if you have local Postgres, the server Postgres, you might go, you know, I don't know, Postgres 10 to 13 or something. If there's a version mismatch, you know, there's there's things to to consider there. Of course, there's third party tools. I see Ansible mentioned here that you can use to sort of move databases across, sort of thing. This would really be sort of direct as native way to move configuration around.

Right? Which I personally see it as an improvement or an upgrade to the schema snapshot system that we have rather than a completely new thing. Just the real question just becomes, you know, how do we add more stuff into that so you can use it for this? That's that's really the, to me, the the underlying discussion. Right.

Jonathan, if you wanna scroll down a little bit further, you can see if there's any other points. Wanna make sure we don't forget. Export considerations, multiple files. I think that's a very important requirement because we've already seen some of the schema snapshots just get bonkers large. Right?

Because if you have a 1,000 collections, a total of 25 100 fields sounds insane, but it happens in the wild. The one export file, you know, is is megabytes and megabytes and megabytes worth of of JSON. Tens if not 100, which gets unwieldy pretty quick. It also makes it more difficult to import, by the way, because we're not really able to stream it all that well and then it becomes a very large file. So you have to read it into memory and then and then use it.

Let's see. Selective export. That, I think, is a tricky one. Right? How do you know what you're exporting if you, consider your roles and permissions part of this, but you have one admin dev role that you don't care about for your production instance, How do you pick and choose?

Right? Pick and choose what to what to include, what what not to include, and handling, you know, sensitive data. Very good point. You know, is this gonna be plain text in a static file? Tricky.

Right? Tricky in a repo. If you scroll down a little bit further, The modular files of extensions, single file per collection, you know, we kinda test.

Speaker 2: Does

Speaker 0: does it actually make more sense to have selective import versus export? Great question. Great great question. Maybe. Maybe.

Yeah. It's it's like if you have Go.

Speaker 2: I'm sorry. Go ahead.

Speaker 1: I I don't have to remember to mute and unmute myself between every sentence. That's fine. Yeah. I can see both being very useful. Right?

For example, if if you have a very, very large, instance with, like, Greg mentioned, right, like, a 1,000 collections, And on your dev instance, you only want to add one thing, do you really need to export, like, this whole thing that's, like, I don't know, 10 megabytes or whatever? Maybe, you know, maybe, it would be enough to just export that table with its fields, and you'd be good to go, because then you could import that partial instance, maybe. But, yeah, for for import or for export, both could be useful. But, yeah, like we said, it's just lots lots and lots of stuff to talk about there. Yeah.

Speaker 0: TBD is is the honest answer. I I also feel like both is probably where we need to end up with that. Because to your point, if you have a large project and you only care about a small subset of that as a sort of templatable piece, you know, you don't want to export everything and have a bunch of unneeded data in your repo muddying up, you know, the workflow and the reviews. Because then also imagine that you make an export and then now you have a PR of, like, 16,000 lines of stuff that you don't really need. Right?

But, yeah, let's see. Extend, you know, existing schema files. That's an interesting one. Merging multiple together, importing snippets from other files, maybe, you know, from nested collections. So that's all about, you know, the the file structure for the project.

Saving the non defaults, I think that is more of a technical requirement to me. Right? It's like we don't have to save default values from Directus in the schema snapshot because they're the default values. Dynamic configuration sync, it's just whenever you make a change in the studio, it auto exports basically, which feels heavy, personally. Feels a bit heavy, but could could potentially work depending on the file format.

But then again, how do you choose what to export on the automated one? Right? So TBD. That's also why it's a could have. I mean, they've they've thought about it luckily.

Automatic real time sync, sort of similar idea. Right? But an option in the Data Studio API triggered chrono periodically high. So the one thing I do notice in the requirements list here is that there's a lot of talk about how to get it out of Directus and in what file format, but not so much the other way around. Right?

How do you get it back in? So if you have something in your repo, whatever that something is, what does that code look like and how do you get that back into the Directus instance? Right? This might be a good point actually, a nice little segue. Like I hinted at at the beginning, our very own Connor has been doing quite a lot of homework on this just to figure out, you know, the, the format and some of the ideas around this and how it could work.

So if let me see if I can find him. Where is this little so many here. Look at that. Hello, Connor. What have

Speaker 2: you been what have

Speaker 0: you been up to recently?

Speaker 2: I have been up to quite a bit involving this config as code and how it plays into all the other different parts of direct disc that we wanna do. Let me get my notes up. Here we go. So you said you wanted me to talk about the structure of the exports?

Speaker 0: I think it'd be cool if you wanna give a quick overview of sort of the the research process itself. Like, what are the things that you've been looking into? What have been the considerations or requirements? And sort of the things that you found. And then dive into some initial conclusions.

Speaker 2: Sure. So what I have been going through and researching is basically we have a couple of different feature requests from config as code to templates to migrating between instances to migrating between different databases. And all of it sort of involves, you know, moving configuration between instances, moving data between instances, and moving files and assets between instances. And that is a very big task when you're trying to be database agnostic and you're trying to be efficient. You're trying to have you're trying to support multiple different use cases where sometimes you wanna overwrite everything, sometimes you just wanna bring in some stuff, sometimes you only wanna take out some stuff.

Sometimes you wanna stream it all. Sometimes you wanna have the file small. So there's a lot of very different considerations that go into it, and then making it all happen with one sort of directest way of making it all magical. It becomes a very big rabbit hole that you start diving down into. And so one of the stuff one of the things that we've that I have been looking at is, you know, what are all the different use cases for it, and what are the requirements for all those different use cases?

And so configuration as code is one of the use cases on there. It's not completely fleshed out yet because that was not on it's one of the later goals of what I've been working on. But with it brings, you know, how do you integrate it with CICD, you know, GitHub, GitLab. You know, do you have your own hosted GitHub, You know, self hosted GitHub. You know?

Where do all your stuff is stored? And so there's a whole bunch of different parts of it. Right now with the schema service, we go and we give you you export a schema of your stuff. It exports everything. You diff it against your instance, and then you apply that diff, and it gives you, like, the changes if you can do it or not.

And right now, that's really it. There's not really too much to the schema service outside of that right now, and adding in all these different layers and features. The schema service is definitely gonna have to take a new look to it. And so one of the things that we've been looking at is, you know, that initial export of a schema, you know, making it more of a distributable type of folder structure file structure, whether it's a compressed zip file or some type of other special file. But basically redefining how that schema export looks to be able to hold all these different configuration items, to be able to hold data, to be able to hold assets, and defining that structure and, you know, you know, as Ryke mentioned earlier, you know, do you want that stuff stored in plain text or do you want it to be stored in, you know, some type of encrypted format or do you want it to be compressed?

So there's a lot of different variables there. And then once you take that distributable that gets made, it can be I mean, for some instances, if you've pulled out data and assets and configuration, you know, that thing could be huge. And so we wanna go we wanna bring that into the this new instance of the target instance. Right? And so we need to different and change it.

And so bringing all that in and processing in is a whole another thing. You know, do you wanna bring in all of it? You know, you have all the export controls. Do you wanna have import controls to how it gets implied and, you know, how it gets imported? And so I've been going through and documenting all those different ways that we can do stuff, you know, what is dependent, you know, if we wanna do this, then, you know, we have to do that, you know.

And so we've been looking at one of the things this week, you know, is what type of file format for all of this type of stuff, for how it gets really big. You know, if it gets a lot of data, if somebody has a 1,000 collections and 4,000 fields, you know, is a CSV file, a JSON file really the right file structure to store all of that data? And so we've been looking at, you know, different options and different file formats for storing, you know, structured data like that in an efficient compressed way that also lets you keep the schema of the schema export defined and structured in a way. And then also making sure that we keep that same right now, we hash the schemas and stuff so that they all stay. You know?

You could only use the schema to apply to this instance because you just did it with it and yada yada. So having that in there too, you know, do we have a metadata file inside of that export that, you know, talks about what the export is? You know, do we have it? Do these become an extension type, you know, that can be used throughout the instance in different places? You know, there's a whole bunch of different options there.

Speaker 0: Yeah. Yeah. Absolutely. It

Speaker 1: was a great intro. Yeah. It was a very good yeah.

Speaker 0: Exactly. Exactly. Yeah. So the first order of business to your point, you know, figuring out what does that file format look like. We know some of the requirements now based on this discussion that we just looked at.

We know some of the the downsides of the current format. So that's a great step. Then, of course, the second big step will be figuring out, you know, how do you go from that sort of source of truth overview into applying it for realsies. Right? So we have that sort of diff step in between.

So for those unaware right now, if you upload a schema snapshot into the Directus API, it will compare it to the current state of the database and then return, you you know, the the the list of differences, basically. So it's a diff, not a list of changes. As in a step step by step list, it's just a diff, like an a a versus b. And then that diff is then uploaded to an apply endpoint, which will basically, you know, apply the changes required to get rid of the diff, right, to make sure that the that the 2 are in sync, that the instance is in sync with your file export. So based on that, Connor, we've done some research on what needs to happen on that diff endpoint itself too.

You wanna you wanna share some insights on what we know now, at least, are some of the requirements to make that work properly with all of these new new additional features that we're trying to add in.

Speaker 2: Yeah. So with that dipping endpoint, some of the things that we are looking at is, number 1, if you wanna bring in data. Right? You know, how do you diff large amounts of data? Are you able to diff large amounts of data?

That's one of the research things that's on the list. You know? Right now, we have an import and export service to import and export data. Looking at Attica drive run options. You know?

Can you import this data? Can you export this data, for that diffing stuff? You know, if you have a really big file, so you do have 300,000 collections and fields, you know, that's gonna take a long time to make changes to the database and to go through and find that diff. And so having some type of long task runner on the instance that's able to sit there and work through that to that, diff or making that diff or distributable or whatever it is. You know, having such a long running service of the background of your instance, I can handle that.

And then also if you're going through and you're applying all these big changes or diffing it or whatever, you don't want people in your instance changing the stuff as you're trying to change stuff. So implementing some sort of maintenance mode on your instance that basically locks it down and puts it, hey. We're making changes right now. You know, you can't it doesn't let anyone else change the schema or anything or the data or whatever you want it to do. We also, have been looking at, you know, for asset data, you know, pulling in you know, do you pull it in from the distributable file, or do you pull it in from the data the asset source?

You know, do you pull it in from the s three bucket directly? You know, do you use it like that, or do you package it into the distributable? Or, but, basically, for the diffing part, the other part is that if you have a really, really big distributable or schema thing, whatever it ends up being called, you also upload downloading it from one instance, uploading it to another instance just to download another big thing, just to upload the other big thing back again is a lot of moving back and forth of all this different stuff. And so the other thing is when you upload that schema, whatever, it diffs it. Instead of it downloading the diff back to you, then you having to send it back up, it being able to just keep the diff on the instance and you just being able to tell it to apply the storage diff that it already has.

And so you don't have to have that all that network changing back and forth, and then, you know, Internet goes out, then you're screwed. You know? But that's one of the things that we've also been looking at for the diffing. And then another thing is, you know, different types of strategies of diffing and importing. So that, you know, do you just wanna up cert stuff?

Do you just wanna add new things and you wanna ignore everything else that has conflicts? Or do you want it to only apply if there are no conflicts, you know, or do you want it to overwrite everything, you know, so it doesn't matter if there's conflict. We're gonna rewrite over it with everything, you know. And then instead of just returning a singular diff that just compares the 2 different schemas and it just says, hey this is what's different. You know putting in more migration like making it more of a step type thing.

So it works through migration steps. And, you know, oh, you need to do this. You need to do this. You need to do this. You need to do this.

And basically a workflow that the thing can work through and and guide those long running task runners on what to do and how to configure your instance. And I

Speaker 0: think last but not least, the having some sort of format to expose potential conflicts for manual resolution. Right? So if one of the strategies has to be that it's up to the end user to pick and choose what to do. So imagine if you go, you know, from a dev to prod, life cycle, right, where you're not so much delete everything and insert everything, you wouldn't wanna do that in a prod obviously, And if you have an absurd strategy, but there is a conflict, right, you you have, like, a foreign key that doesn't work anymore or something like that. There needs to be some sort of format in whatever this diff looks like or this migration step format looks like that just has, a list of Here's the steps with the known conflict What do you wanna do?

Right? How do you wanna modify that that step, those steps to get around the conflict? Right? Do you wanna upload, you wanna upload new data, or do you wanna, ignore that particular step, or do you wanna ignore those records? You know?

So to your point, if we need some sort of driver and to check if you can import all of the data, it's sort of a requirement in order to be able to extract, you know, potential conflicts. So we need to have some sort of way to search through the data you're trying to apply, in order to know how to deal with conflicts. Right? So now that we're talking about all of this, what we're what we started to notice is that we're not so much talking about, you know, configuration as code specifically or templating specifically. What we're basically shaping here is a system that works for multiple things, right?

Depending on how you use it. So if you were to make, a snapshot of everything, just full stop everything, and you import it as as apply everything, what you're talking about now is basically backup restore. Right? If you're exporting a small fragment and you're importing that into another project, you're basically talking about templating. Right?

If you're exporting just the schema part and no data and you apply that to a new project, you're you're talking about seeding or something. You know what I mean? Like preparing preparing, a database basically, a a new project for what you wanted to do. And the question now is, how does that how does that all tie together how does that tie back into the configuration as code parts specifically, Connor? Because what we're talking about now is, you know, a new sort of format, generated by Directus that you can save somewhere, you know, which is fairly still, it's still fairly proprietary because it will have to be heavily compressed and, you know, directors needs to know what the format is.

So what is the current thinking on tying it back into the code side of this question? Right?

Speaker 2: Yeah. So if we went the route of having some sort of distributable file structure folder structure that is some proprietary format or is encrypted or compressed or whatever, you know, you're not gonna be able to sit there and write code that is a compressed file. You know, you're gonna have to have write something that generates that file. So one thought that we've been having is following the lead of some other types of, you know, companies like AWS and their SDKs. So, basically, having some type of SDK that you can write and configure your instance with, and then you tell the CLI or whatever to execute that, read those different set the code that you've written, and then it will make a direct distributable file, diff file, whatever it is, from the code that you've read.

So if you wanna go through and you wanna define all of your collections and your fields or whatever, and you can go in and define that in all your files and your code and then execute that code, it comes up. It generates that file that you can then use to apply those changes, import those changes, diff those changes to any of your target instances that you want to.

Speaker 0: Jonathan, if you might be wanna pull up, I think one one piece of inspiration that we were looking at for that part specifically was AWS's, what do we call it? CDK, I think, code development kit. If you wanna quickly Google that, it could be it could prove like, it could put some flavor to that to that point. So the way AWS has that, they basically made a JavaScript library that you can use to code, like, configuration. And then what it does under the hood is it effectively converts it into, a CloudFormation template, I wanna say, and then applies it immediately.

Right? So under the hoot, you don't really notice the difference, but it's effectively a 1, 2 jump. Right? So it converts it into their proprietary thing in the middle first and then just applies that as is. What am I searching for?

Sorry. CDK, the cloud development kit. If you wanna pull up the GitHub repo for that, maybe I have a link somewhere. I'm just curious if they have some some examples somewhere. It's been a minute since I've played with this, but it's an interesting, idea.

There was a Directus community library, a little while back that that tried doing a similar thing, but it would run it against the API endpoints. It wasn't as flexible yet because we didn't have we don't have, you know, GreenStep. Is this branch? You know. If you wanna do AWS before that, because I think CD case is all different.

Yeah. There we go. There's a link in the chat as Open it up. Here we go. Here we go.

Here we go. Is it gonna go? There it

Speaker 1: goes.

Speaker 0: So this is an interesting reference for people that wanna look it up. At home, it's basically, you know what was that? Distracted by the chat immediately. Using something like CDK would mean that changes would need to be replicated from the UI to the generation scripts. Changes would need to be replicated.

That's a great point. Yeah. How does that go both ways? Right? Because if you have that one format in the middle, directors can recreate that format in the middle.

The directors wouldn't be able to recreate arbitrary JavaScript, basically. Right? So when you opt into something like that, I think it becomes a one way street by definition because we cannot figure out what parts of your JS file are, or your code because they also have some other languages, but you get the idea. And we don't know what parts of that file are auto generatable and what parts are, human created. Right?

There's so there's no way to auto generate that back into a manually created file. So at that point, that's a great point, but it's the it really becomes, you know, it becomes a a one way street at that point. Some of the data community did a sort of proof of concept library to do this for, which is very interesting. So if you wanna pull up the direct to community schema builder kit repo, I just sent a link in the chat there. It was very much inspired by a similar idea where you have a JavaScript file that you use to sort of define.

It's almost, you know, a declaration file rather than, you know, JavaScript, but it is still just JavaScript that runs from top to bottom. But you could define your schema and how it's applied as, you know, individual build steps, in a JavaScript. So this is where it gets real heavy on the code part of the codeless configuration. Right? And not so much just the, moving stuff around.

So in in terms of big picture stuff, I really see this as the final step of whatever these changes are that we're discussing. We'll have to start with what is that new format in the middle, how is it generated, how is it used, and then see this as a way to sort of generate it into that format and then apply it automatically. Right? But, yeah, that that JavaScript syntax is an interesting interesting idea. So, yeah, I see some folk typing.

This is one of those very typical director's projects where there is about 600,000 different opinions on the ideal way of doing this. And I think we we saw it in the chat immediately. Right? Shout out to a person that was like, isn't it not just the database template? Why why bother?

Right? Which I can totally get behind that, but then there's a 180 or so of votes or something like that on the on the discussion. So apparently, you know, that is not an opinion that's shared.

Speaker 1: Yeah. What what's interesting is, right, because, technically, the most basic example would be something like a database migration, generally, in the beginning, you know, for the configuration as code, for example. So there's another, a similar project director, the CMS like project that I've checked out and see how they did it. And, so they handled this a little bit differently. They don't have, like, a DSL type, you know, language that defines your infrastructure or whatever.

But, they went the route of as soon as a person or user, via the UI creates some type of change in the tables or the collections or however you want to call it, the instance automatically generates a migration file locally for that specific change. And, there's then a mode in the instance where you can disable any, any ability for other users to change the actual instance. So you can actually just rely on the migration files, which is an approach that you could take, you know, because a migration file then could technically do anything you'd like, you know, with with regards to, you know, the collections, the fields, whatever, even inserting, items. But, then, you know, because we're a directors, we want to that's a little too easy for us because we would like to include some type of things like, alright. How about you, locally, you develop some type of new feature, new new table, new collection, new fields.

And then in order for that to work in the way that you want it to work, you need an item. You know? You need to include a new new row, data row, or an asset. This is the thing now. Because, you know, assets are not inside of the database.

So we want to include assets, for example, or maybe this, of course. Nothing is, you know, set in stone, but, you know, including assets, for example. So you want to make some changes, and you need to include some assets for your changes to be even, you know, useful. So you would then have to, you know, do your changes, test it locally, include everything with the correct file name, with the correct row, whatever, or other metadata of an asset, for example. But then on production, you would have to replicate that again.

So you get this back to back to step 1. Right? So this is kinda it kinda sucks, with a migration part. So even then, if we want to include this, then we get back to the issue at hand that we were talking about. Right?

We want to have a process that could export something, and you can recreate that between instances and so forth and so forth. I would, I just wanted to mention that for the others in the chat because, it's not just about, you know, just adding a field because that's basically, you know, that's a basic thing, which we could solve. And I think the Director Schema Builder kit is basically that. Right? You you generate some type of syntax, which generates some type of migration.

But you have to keep in mind then, of course, right, different database vendors, we have to abstract that. Because, for example, you know, in SQLite, if I remember correctly please correct me. If I remember correctly, like, you can't alter a table and introduce, like, a foreign key. You're forced to drop the table, actually, and recreate it in order to add a foreign key, for example. Other databases can do that as CLI can.

So there's lots of different

Speaker 0: honest there, I'm pretty too sure that the last minor release of SQLite didn't come out too long ago. They finally do have that alter table sort of baked in. Although then, you, of course, you have the you have the side effects that it depends on your native build of SQLite on your machine, which may or may not have had. So generally speaking, historically, you've been absolutely right. It's been a nightmare and a half to to do that.

Speaker 1: Lots of fun. So I just wanted to make sure that, people in chat, you know, it's it's not about just adding a thing. It's it's a little more involved than then. And, you know, including assets and, like, the other sent. Right?

Maybe there's proprietary information or whatever, and you're not allowed to leave it on your hard drive. Maybe you want to zip it, encrypt it, compress it. There's lots and lots and lots of different steps that we have to, kick off there. So alright. Oh, we got some in the chat interaction.

Cool. Cool. Cool.

Speaker 0: So alright. Top to bottom. First question. What speaks against using JSON or YAML files? Instead of JavaScript, this way the changes in the webpack could also be synced back to the files easily.

So for what it's worth, the the, formats that we're talking about being generated from directives would most likely be, you know, in some sort of structured format. Not quite sure if that's JSON or YAML yet or if we have to find some sort of optimized, file format to do that. Because the the the risk of JSON and YAML exports, once you start including data, you know, we no longer know how much data you wanna include. Like if we're treating this as you could use this for backup restore we could talk about a large large amount of data right at which point we need to have a very optimized structural format that may or may not be usable in that point. Right?

Connor, remind me, we found like an Apache file format that could be interesting for this. I think what was it called? Parquet or something. Right?

Speaker 2: It was called Parquet.

Speaker 0: Parquet. Yeah. That would be an interesting file format for something like that. Or potentially using a SQLite database as as the exported file. Like, that's a completely different, direction.

But But you get the idea. We need to have some sort of optimized compressed file format because the export file could get really large. Now, it might be an option for the way, you know, you save one of those to just save it in a sort of raw mode. Right? Where it doesn't save it compressed, at which point it's just it could be human readable YAML or or JSON, including, you know, the ability to properly, source control it.

On the migration note, I think you answered that before, Daniel, like, exactly right. If you're doing auto generated migrations, it's really only for the database schema part. Like, we can't really know on your behalf if you consider, insights dashboards part of configuration or flows or something. Right? So it's it's gonna be it's gonna be tricky because people different people have different export requirements.

And if you go from dev to prod, all bets are off. Right? You you never quite know what the the idea is. Creating internal libraries fiber to schema works with native access to directives rather than the API using integrations and help creating a lot of complex repetitive action. I can imagine.

Yeah. Because you can write a little JavaScript for loop and just block. You have 10,000 collections. Right? But, you know, the lack of two way integration between the UI does cause issues which is the unfortunate side effect of using, you know, a programming language rather than a declarative language like yaml or JSON, for doing schema modifications like that.

You're gonna lose that two way integration. That being said, you know, if Directus has a don't allow me to change the schema environment variable flag, whatever, you could make that, you could do that on purpose. Right? For a production instance, for example, I can totally imagine that you disable any sort of schema modifications just for security reasons, and and availability reasons and only allow those changes to happen through whatever system we're we're cooking up here. Right?

I think default value filtering could help make the YAML auth more manageable. Fully agree. You know, we should only store the stuff that we need to know, and storing default values feels like a waste of waste of space. Then Azure is working on some sort of YAML based metadata authoring, announcing Azure data delivery net auth sales. What marketing email?

Very curious. Haven't I haven't heard of that one before. If you wanna keep the GitOps thing, it should really be a text format. Good point from Tim. Which may or may not be answered from by Dominic here.

If you split up between schema and content in different formats. Right? Maybe the configuration piece is all human readable file formats. But if you have a data export that is maybe there's a file size threshold. Right?

If you have a very large CSV export maybe there's just a smart point where it's like, oh, you're trying to save 10,000 rows We're gonna flip it automatically into a compressed, non readable format so you get the best of both worlds. Right? Potentially. Cool. Alright.

Cool. We tried.

Speaker 1: I it's something similar. Yeah. Just to, you know, chime in to regards to what you just said. So with the, like, yeah, like, yeah, we have to split that up preferably, you know, or at least, you know, required to have this as text based so you can, you know, use it in version control, whatever. And, for, like, including items, for example, you know, a ZIP is is a nice thing that you could use, but then, of course, you know, maybe this includes then items that are not, like like an old version, for example, and, you want to insert something in in where a field doesn't exist anymore, and and lots and lots and lots of other, you know, stuff.

And, you can then, of course, if we then have all of the different points that we want to persist, right, like flows, permissions, the general config as it is, you could include this then also in the export with the items, so you can do both at the same time or see if it differs and then cancel the thing. But, yeah, it's, it's a fun thing. You know? There's lots and lots of things that could go wrong. There's so many.

Speaker 0: No no matter what, we wanna make sure that the output file is a single distributable. Right? We do wanna make sure because we on the one hand, we're saying we have to split it up into multiple files in order to make it efficient and easy to work with, But, at the same time, we also wanna make sure that you have a singular thing, singular file that you can send over, to somebody else. Right? Either through the API, so you just have a single download or a single upload, or as a file, maybe packaged through, you know, the marketplace.

Shout out. Wink wink. Notch notch. Or as, just to email it to somebody for a like air. Right?

Put it in a GitHub issue as a zip. So there needs to be some sort of both. Right? But I could also imagine that, you know, the API, lets you download it as just a zip, right, that you can just double click to open if you're on macOS or do whatever else it takes on other platforms, to unzip it. Looking at you, Daniel, I'm sure there's a 2 step process.

For those who, out out of the loop, he's, this is the year of Linux on the desktop, evangelist within the team.

Speaker 1: Yes. This year. Here it is. This is the year. Mark my words.

And this year is the year of the Linux desktop. This year. Yes.

Speaker 0: But long story short, we it's it's we're we're in that weird in between that we need both. Right? We need both the single file and multiple files. So we'll most likely have to come out with some sort of zip, gzip, something like that in between. Cool.

Prisma migrations are an interesting way of doing things. They have a custom format, which is more concise than directed YAML, and then some CLI tools that create actual SQL migrations in the syncs the environment. Yeah. Great example. Right.

Good example. They have they basically do with that shadow shadow database, if I'm not mistaken, that's how they keep track of those migrations step by step by step. Yep. Right? And then with the the CLI tool, it can compare your custom migration format with what they already have tracked so far.

And then, you know, apply apply the diff based on that. Still has the similar I think a similar one way issue, though. Right? Because the Prisma migrations that you write manually, I don't think they can sort of update those from the others from the other side, so to speak, that a two way binding. Good point, though.

Good point. Alright.

Speaker 1: Could remember. I had a problem with in the past. So, sadly, I can't remember it right now, but it was very painful. So, it's not a 100%. Yeah.

It's not a 100 perfect thing. Nice. I've seen people you know, there's there's lots of other, what is it called? Drizzle drizzle right now?

Speaker 0: Oh, there's a couple of RMS like that. Yeah. Yeah.

Speaker 1: So there's there's lots of sauces that we could, you know, yoink some code from, but be inspired. Let's say be inspired.

Speaker 0: Yeah. I don't think we've been yoinked. I don't think we've legitimately ever yoinked code before.

Speaker 1: Borrow. Borrow. See.

Speaker 0: Strategies is interesting there. But the the the main difference and then you you touched on it perfectly before. The main difference there is the just keeping track of the database schema versus having that ability to just sort of manual template in between moving stuff between dev and prod that is not just schema, but also data and how to deal with that, which at any point needs to be, something you can do manually. Right? It doesn't have to be manual all the time, but it needs to be something you can do manually Right?

Cool. Well, with all that being said, looking at the clock here Connor, back to you. We've discussed quite a lot of the research up until this point, some of the requirements. You and I have also sort of been daydreaming about potential ways to implement this moving forwards. You wanna quickly touch on sort of the different, different phases and different sort of parts that we wanna touch as part of this bigger bigger effort?

Speaker 2: Yeah. So the first thing we wanna do is we wanna figure out how this is gonna look, the distributable, the diffing. What is that file structure? What's the file format? You know, what are the requirements for that?

Do we have encryption? Do we have multiple files? Do we have one file? Do we have whatever? Like, we've already discussed here.

First thing is defining what that'll look like in totality, covering all use cases so that as we progress through the the phases of this project and making it happen, we can keep that spec that we have for those files in mind, and we can make sure that we cover everything whenever we could work through the different phases. But the that is phase 0 to finding out that spec, figuring it out what it looks like. Phase 2 or phase 1 would be upgrading the schema service foundationally right now. So making sure we have all the bug fixes with the schema service taken care of, you know, adding the new different, you know, different strategies that to it that might be needed. You know?

Do we have, you know, add in those exports, export filters, you know, only export this stuff, you know, that stuff. So getting that schema service, adding some more features to it. And then the next phase is working on data importing. And so adding in the different features that would be needed to make the data importing work, You know, dry running, imported data, importing strategies. You know, do we have a import strategy where right now everything just gets up sorted?

Do we wanna have an import strategy where if you import the data, it drops all the current date data in the table and re imports everything fresh? You know, adding those different options. Some other just things, ideas that have been thrown in the mix is, you know, if you're moving between one instance to another instance and you don't wanna bring any of your IDs, your primary keys, your foreign keys, you know, adding some type of way to anonymize that as it goes from one instance to another. So the new instance makes new IDs for everything. Or adding, you know, the ability to for, like, templates, you know, if I wanna bring in a template for project management and have a due date in one of the items, but I made the template a year ago, then the due date's gonna be from a year ago.

So, you know, being able to have dynamic data that gets brought in on import. And so, you know, oh, I want you to import this as a date, but make it for in 2 hours after the import is when it's set. And then the next phase after that, working on data importing is getting into putting it all together. So we've worked on the spec for the thing. We worked on the schema service.

We worked on data importing. Now we need to bring it all together into this this new overhaul configuration, multimodal thing, and have it work on bring all those pieces together and make it work, basically. And then after that, once we've gotten the specs spec'd out, the schema service upgraded, the data importing upgraded, we brought it all together, and it now works foundationally. Now we need to figure out how do we use this to implement these different use cases? So for templating, you know, what do we build inside of the data studio admin app from a user interface and API perspective to make templating work?

What do we do from the configuration as code to use the stuff that we've already built? So how do we implement the feature set that we want in the configuration as code use case. You know, are there any other use cases? You know, back up being and restoring an instance. Do we have the feature set that we wanna use for that?

And then that should, in theory, wrap up the project after that, after we figure out and implement those use cases. So there's quite a few steps here. Quite a lot of different items and it's going to take a long time. But at the end, we should have something pretty cool.

Speaker 1: Ideally. Yeah.

Speaker 0: And I think I think Daniel's facial expressions told told the whole story. No. But that'll I mean, it'll make sense to me. It sounds it sounds like a large overwhelming amount of stuff as it usually does. But by breaking it up, you know, step by step like this, we actually have a pretty solid idea, you know, start to finish of of what is involved to getting that across the finish line.

And, yeah, as per usual, you know, there's a lot of, what's the word? Direct this magic going on under the hood to to tie it all together. But I'm always just very excited and glad that we're able to sort of re envision this as we have one underlying core foundational engine that can power all of those different use cases. Right? Rather than trying to tack on a new sort of templating piece and a new sort of code first configuration piece and a new sort of other piece.

Right? That just increases the tech that makes it hard to maintain. It makes it those different flavors incompatible. You know, somebody will ask, how do we do a template as code? And we're like, well, you can't because code is not for templating.

You know, that kind of stuff. It really reminds me of how we built flows. Right? That we had the hook extension first. And if you configure a new flow, you're effectively just building together a hook extension, and it's all the exact same underlying, underlying logic and event based system, which is also why, you know, flows you can do it say the same stuff as you can in a hook.

Right? Well, same events. I mean, you can, of course, in a hook, you can code it yourself. So I'm very glad that we're making this, you know, a foundational upgrade to the schema snapshotting engine rather than trying to make it yet a new thing cool cool cool well that all being said I see we're at time here. Let me just quickly briefly peek at the chat.

Do we missed anything? Pascal mentioned dynamic collection of field names will be cool. EG importing third party templates so you can choose your own target names. Good point. I think a similar goes for the conflict resolution piece that we talked about.

Right? If you're trying to import won't be complex. Very, very, very true. But rest assured, it will be complex, I'm sure. Cool.

With all that being said, this will be, life on Drax. Io/tv/requestreview in about a week week's time. Also, last episode, you can find there too. Should be if you're watching this on DirectTV, it's probably somewhere over there, there, there, or down there. Is this the point where we say, like, and subscribe?

No. We don't have that yet. It's exactly we thank you all for joining.

People

Rijk van Zanten
CTO of Directus
Jonathan Wagner
Sales Engineering Manager at Directus
Daniel Biegler
Engineer at Directus

Resources

GitHub Discussion