Citi Bike publish the millions of rides used by their customers around New York City's thousands of stations. Kevin shows you how Directus can be used to further explore, understand, and remix this dataset both in the Data Studio and via API.
Speaker 0: In this show, we give new life to open datasets with the help of directors. Join me as we explore, analyze, and generate APIs to improve access to and to democratize data. Before we get started, if you have or know of an interesting open dataset and want us to use it, just reach out and it might just feature on a future episode. Today's dataset comes from Citi Bike. Across New York City, there are over 2,000 stations where rental bikes can be picked up and dropped off.
Every customer ride is represented in this dataset detailing when and where a bike was picked up and when and where it was dropped off, as well as some additional metadata about the vehicle and the customer. Now in March 2024, there were over 2,000,000 rides that took place, but for this show, I've just imported the first million to direct us. Now compared to other, episodes, I've done a little more preprocessing on this data, And that's because I wanted to have 2 collections when this data set only provides one set of data. I wanted to have stations and then rides separately. So I needed to split out the stations from the rides data and, import those separately.
Additionally, like every previous episode, I converted station coordinates to GeoJSON objects to enable mapping features inside of. And with that, I think we're ready to jump in and get started. So as you can see here, we have 2 separate collections, stations and rides. Each station has its internal identifier in the Citi Bike system and the friendly name. Now if we go into any of these, we additionally see the map rendered because we converted that set of coordinates to GeoJSON along with all of the rides that were started and ended inside of this dataset.
This is one of the less popular station, it seems, compared to others. Now because this is all inside of Directus, we can start using Directus' powerful filter and search interface in order to narrow this down if we want. But for now, I think we will just use everything. Now inside of layout options, we can change the layout. Right now, we are rendering this data inside of a table, which, of course, is a standard way to represent data in a database.
But what's really nice is that Directus, out of the box, provides a map interface a map layout rather. And so we can explore this data kind of a little more visually and as it is represented in the real world. So this is just 820 4 of the items that are inside this area of the map. And we can click in if we want to individual items like this. So here we see them, perhaps a little more of a popular station in terms of ride started and ride ended.
Talking about rides, let's look at the rides collection. So this is a huge dataset. This is a 1000000 items. And once again, there was over 2,000,000 in March, but I've just imported the first million, because that was the first CSV that is provided from Citibank. They chunk it up.
And each individual ride has a unique identifier, the station start and time, the station end and time, whether or not the member is a casual user or a member, and whether the bike is a classic bike or an electric bike. Now this station start and end, these are relational fields. So these link back to the stations, to the stations items. So we can click here, and we can see the station item that we just opened up before. Now, again, we can explore this in many different ways inside of Directus.
But when it comes to analyzing data, we want to use Directus insights. So let's go see something that I've set up earlier. This is a directus insights dashboard that I built before I started recording. And if I'm completely honest, this was one episode where I got a little bit of choice paralysis about what I could possibly represent because there is just so much rich data inside of this dataset. But for now, this is what I've built to demonstrate something that's possible.
So here I have a relational global variable field, which means it will show every station in the interface. So I clicked 1. I clicked this one here, And all of the other panels relate to the to the item that I selected. So firstly, we have the internal identifier. We see how many rides were started at this station and ended, which perhaps is interesting in itself.
People seem to want to end at this location more than leave it. We get to see the number of rides started per day within this dataset and whether or not it is a cat mostly casual user or member or, a member user type and what kind of bikes are picked up. Now one thing that's really interesting about this one that I do think is telling, for example, is almost every station I have tried this with in the past has a majority member, and less casual users. But, of course, being Central Park, this is definitely a tourist location. So it's no surprise that perhaps there are more casual users picking up bikes or dropping off bikes at this location.
So that's just really just an interesting note, I suppose. Maybe one way this can be used is if you're spotting there are areas where there are more casual users, but there are a high volume of users. Maybe this is an opportunity to run campaigns to convert people into members, for example. But I'm sure there's a lot more you can do with this data. Now this isn't all.
Everything you do inside of directors can also be done using the automatically generated APIs. So let's go see how they work. So I've prepared a small number of requests just to demonstrate the capabilities of the API. But this is just a tiny sliver of what's available in terms of fetching data. So firstly, we have a stations collection.
And as a result, we automatically have this endpoint slash items slash stations to go fetch all of the stations. So here we see, you know, an array is returned of objects. Each object represents 1 item in the in the collection, and this looks very similar to the UI that we saw in the editor. We have an internal ID, a name, a point, a set of coordinates, in GeoJSON format, the ride started, and the ride's ended. Now given that they all have an ID, you can also further break this down and perhaps just get a single item back.
So this is now not an array anymore. It's just that single object. So that's stations. We also have individual rides. So once again, each one has an ID and all of that data we just saw.
Right? All of that data we just saw. Station start and station end are IDs of, of the stations. Now you can further start to narrow this down by using Directus' filter language, filter filter syntax. So here, we are applying 2 filters.
And all we're doing is saying, grab all of the rides that happened inside of the 3rd month. In this case, that's all of them because they're all in March, but that may not always be the case, and on the 7th day. So this is everything from the 7th March. You could further, of course, narrow this down to specific stations if you want or specific member types and so on and so forth. You can just add more filters on the end here.
Now this final request is not so much about fetching data, but rather analyzing it using our aggregate and group by query parameters. And it's these which actually power the panels inside of Directus insights. So here we're saying, count all of the items and then group them by the day in which that ride was started. And what we get as a as a result is for every day, we get a count. So we get to say, hey.
On the 1st March, there were 32,329 rights that took part inside of this dataset and so on and so forth. For every single day inside of this dataset, there is a, a different count. So that's a little bit about how you can use Directus Connect and the automatically generated APIs in order to explore the data either to actually query the data itself or run some analysis. So once again, I hope you found this interesting. And if you have an interesting dataset or know of an interesting dataset that we can use in a future episode, just reach out.
I love to see it. Until next time. See you later.