GitHub and then some


Photo CC-BY-SA Nick Quaranto on Flickr

It's a good time to be open source. Everyone is talking about GitHub:


Git beyond software

Surprisingly, I think the TED talk and the New York Times article did the best job of discussing how GitHub might work beyond computer code. The laws of Germany and Utah are on GitHub, but these are run by volunteers, and the commits are not an official track of changes made by legislators. I thought that the New York legislature had this worked out, but it appears what they released was only the bill tracker for the state senate.

I do a lot of work with maps, so I was excited to see OpenGeo working on what's essentially git for maps. We need this because when I change my location from 7.135, 171.192 to 37.775, -122.414, GitHub just sees one line erased and another added. It doesn't represent the change as a point and its movement across the globe. OpenGeo's work goes beyond this, but it's a fair example of something simple that a community needs yet GitHub is unlikely to develop on their own.


How GitHub represents changes to an image... what about maps?

I don't know other fields as well, but it's possible your work includes changes that you would want represented. And GitHub is unlikely to understand them on their own.



Octocat Cookie Cutter CC-BY-NC-SA Thomas Amberg on Flickr

GitHub and data

When the City of Chicago released several datasets on their GitHub account, the open government community rejoiced. It gets an excited mention in the Wired and O'Reilly Radar articles published this week.

Although this data was already available on the city's open data portal, this release comes with several significant changes: examples of R, Python, and Ruby code that works with the data; encouragement to fork and improve on the data; a more open MIT license. Yet the Hacker News thread has a debate with comments like:

"Github is ridiculously unsuited for one-time publication of blobs. Seriously." - sokrates
"Merging user contributions comes with a number of problems. It'll be interesting to know how they'll manage it. Or, they won't, and this is entirely token...
I'm also dubious that GitHub is the right way to release data. There are a huge number of people interested in civic data, and GitHub is probably one of the least accessible ways for 99.9% of people to get it." - Hamish Campbell

There are already 11 forks of Chicago's building data, none with new commits, but it's a matter of time until someone gets sufficiently organized to add a new building or remove a recently-demolished building. He or she copies all 130MB of Chicago's buildings to a repo, unzips /data/Buildings.zip, makes the change, zips it up again, and submits a pull request. The city needs to open up the zip to see what's there. How do they - and how do we - view, verify, and track changes? What level of detail is required? Does the contributor pick the next available number as the ID for the building, or does Chicago need to add it to their internal database to assign an ID? In practicality this repo may move forward independently, itself a fork of Chicago's internal data. Meanwhile OpenStreetMap will import this data and make their own changes.

Of the two (GitHub and OpenStreetMap) I believe OpenStreetMap will have a more open, accessible, and visual edit process. That's where I published Macon's open data during my Code for America fellowship, and where it continues to grow today. There's a risk in OpenStreetMap because anyone can edit it, unlike GitHub where changes go through a gatekeeper to become part of the official master branch. But honestly does the government need to maintain control of how the community builds up their data? I think it may prove too slow and cautious to keep up. As long as there's an official dataset somewhere, it'd be better to let the community take over modifying the data, and cities can pick and choose from what the community has to offer.



PHP community visualization CC-BY franckcuny on Flickr

Centralization has consequences

Another risk to GitHub for everything: we all want our work to be in the big GitHub network, but when it's blocked in China or disabled by a DDoS attack, the whole system shuts down. Hacker News discussion tends to revolve around why this is happening or how GitHub might prevent it. But if I want a project to be available globally and free from crossfire from anti-GitHub hackers, isn't there a benefit to hosting a custom git site, using Gitorious or Redmine?



Felt Octocat CC-BY-SA Yuko Honda on Flickr

GitHub plus

Over the next several years, I believe GitHub and OpenGeo's git for maps will be two of many competing systems for open-sourcing software and data. Ideally they will be cross-compatible, so I could edit a map visually and users of plain git will be able to accept the changes line by line. But there's no guarantee for that.

It's also possible that GitHub will consider an API for industries to represent their changes to 3D models, audio and video files, cartography, networks... something like Twitter's media pop-ups. This wouldn't solve the problem of centralization, and it might lead to nasty disputes (like Instagram's retraction of photo embedding on Twitter), but I think it's the only way we'll see true global use of GitHub.



- Nick Doiron - March 10, 2013