Skip to content
Back

Scaling OpenRefine

Legacy EOSS

2019

Proposal Summary

To attract new contributors by improving OpenRefine’s documentation, and implement a new data model to improve the scalability, transparency, and reproducibility of OpenRefine workflows.


OpenRefine

OpenRefine is a power tool to clean up messy data. Requiring no knowledge of a programming or query language, it lets users find and fix inconsistencies interactively, match their data to external databases, pull additional data from these, and many other useful operations. The resulting workflows can be extracted and applied on other projects, making them reusable and reproducible. OpenRefine was originally designed as an Export-Transform-Load tool to populate Freebase, under the name “Freebase Gridworks.” It was then briefly a Google product which became an open source project when Freebase was discontinued. Thanks to a grant from the Google News initiative in 2018, integration with Wikidata was developed, making it a tool of choice to import data into Freebase’s successor.

Project Team

Antonin Delpeuch Github

Code for Science and Society

Owen Stephens Github

Owen Stephens Consulting