Easy Access to ohsome full history OSM contributions using cloud hosted GeoParquet

Room: Amboseli Hall

Saturday, 14:30
Duration: 60 minutes (plus Q&A)


no recording This event will not be recorded.


Workshop Slides

Workshop Book

Github Repository


Back to schedule
  • Benjamin Herfort
  • Rafael Troilo
  • Michael Auer

This workshop teaches you how you can accelerate OSM data analysis without the need to run your own computing cluster. We will provide a sneak preview about our new cloud hosted ohsome full history contributions data and will show you how you can use it to understand the dynamics in OSM.


Technological advances in the geospatial world are moving fast. In this workshop we want to explore how the OSM community, researchers and data scientists can make use of recent developments such as GeoParquet, DuckDB and Polars. If you have not heard of these before don’t worry, but be excited how they might transform your geospatial analysis workflows and could provide new insights about mapping in OSM.

In this workshop we will provide a sneak preview about our new cloud hosted ohsome full history contributions data. This dataset brings you something we have wanted for a long time: a single dataset which contains information about OSM objects and the related changeset metadata such as OSM editing software used, changeset comments or hashtags. As the datasets contains OSM’s full history (and not only the latest snaphshot) it’s perfectly suited to understand the dynamics in OSM. Based on this dataset we will show how you can run global scale data analyses for various (research) questions and how you can visualize the results.

We want to explore how we could combine data from OSM with other data sources. Here we want to first start with looking at the additional value of OSM Changeset metadata. We will look into the temporal evolution of OSM (what changes in OSM over time?) and what these changes can tell us about the real world. For instance, this analysis can highlight regions with a very high mapping activity for specific editors such as StreetComplete. We want to take a closer look at the types of changes that happen in OSM. This can tell us more about the community itself and could reveal regional preferences in mapping style. The main advantage of using cloud-hosted contributions files over simply using the ohsome API is that you can actually zoom in your analysis down to the individual OSM object.

We will touch the topic of data quality (How good is OSM?) and want to compare the coverage of OSM building and road data with the coverage of other datasets from Microsoft, Google and other sources. This analysis can tell us something about the completeness of OSM, but also about the accuracy of those machine-learning derived datasets. As the discussion about the integration of AI datasets in OSM can be sometimes heated, we hope that this analysis will provide some common ground which regions are most likely to be affected (positively or negatively) by AI-assisted mapping.

Before diving deeper into these analysis questions we will provide a short introduction to GeoParquet files and they can be analyzed with DuckDB. Next, we want to work on 3-4 Jupyter Notebook hands-on examples we have prepared for you. For these it would be good if participants can install QGIS in advance of the workshop. The workshop material will be made available via GitHub and the participants can use either a local python environment or a cloud hosted service such as Colab or similar tools.

We hope that this workshop can start a discussion about how we can accelerate OSM data analysis without the need to run your own computing cluster. We believe that cloud hosted GeoParquet files and DuckDB could open up OSM data analysis to a much broader audience. We invite OSM community members, researchers and data scientists to bring their own questions as we are interested to learn from you what could be done to bring those ideas into reality.