About the Project

IMLS Grant: LG-87-18-0062-18

New York University Libraries, in collaboration with the NYU Arthur L. Carter Journalism Institute, the NYU Center for Data Science, and ProPublica, will develop a prototype for a software tool to capture and preserve data journalism projects in a scholarly archive. The prototype will add to an existing open source computational reproducibility tool, ReproZip. The proposed extension will be designed to quickly and easily pack and unpack dynamic websites, enabling the first large-scale collection, preservation, and discovery of complex interactive websites. Storytelling with data has revolutionized modern reporting, and the dramatic increase in the production and popularity of data journalism projects can be seen both at news startups, as well as at legacy news organizations. These stories are an important part of the historical record, yet due to their technological complexity they cannot currently be archived or preserved at libraries, newsrooms, or cultural institutions. As such, they are disappearing. This project will alleviate some of this problem.

Outcomes of this project include an open-source prototype for an extension of ReproZip and documentation of its ability to pack and unpack interactive data journalism projects. The new extension will be composed of two essential components:

  1. ReproZip & Webrecorder Integration: While ReproZip traces the back end of the web server, Webrecorder will execute simultaneously in order to record the front end files that come from external locations. Those two sets of files will be consolidated into one ReproZip package.
  2. ReproZip Unpacker: When a secondary user replays the ReproZip package, a proxy integrated with Webrecorder Player will sit in front of the unpacked application, to direct requests either to the WARC contents or the application's environment (in the ReproZip file)

We plan to build this tool in conversation with several prominent news organizations producing data journalism, with the goal that newsrooms become incentivized, willing, and able to adopt a self-archiving workflow using ReproZip.

Current Team Members

Katy Boss

Katy Boss

Vicky Steeves

Vicky Steeves

Remi Rampin

Rémi Rampin

Fernando Seabra Chirigati

Fernando Chirigati

Brian Hoffman

Brian Hoffman

Bofei Zhang

Bofei Zhang