Wednesday, July 22, 2015

Pune's bus transit data : explanation and invitation to get involved

Getting around to describing an ongoing effort properly:

Today we have received bus route info data updated from PMPML, in the best form they can manage from their end. It's human-readable and machine-non-readable.

We need to reformat it to a machine-understandable standardized form which would then enable creation of apps, trip planners, GTFS feed, etc. I worked on such a format over the past few months, incorporating all the lessons we learnt in our initial encounters with the data (whose output you presently see on the website), and last month had proposed and gotten approval for such a format at a meeting with the new CMD. We even got some much appreciated data that had been prepared by ITDP in 2011. I then obtained a deport-wise breakup of the routes from the PMPML office. I then populated the 2011 data, split up depot-wise, into the standardized format. Below is the result of this exercise so far:

A snapshot of some of the data so you can get an idea of the structures:
..further down..

This spreadsheet uses data validation to ensure that every stop entered is one that is existing in an adjoining "stops" master list, snapshot below:

And here is the crucial gem : to have a proper bus routes database, you need to first have a standardized stops database, and all the routes must be built up from that stops database. ITDP's 2011 data had the stops properly standardized. In the spreadsheet, when you enter a new stop for a route, it does a Search-As-You-Type looking up this stops list and the user chooses the stop from the dropdown list.

It works with both Marathi and English (If you want to enter in English, fill in the column next to this. Marathi cell takes precedence over English in case both are filled). Being able to work with both languages is crucially important here.
You can try this out for yourself : go to the "sandbox" sheet and edit a route (just first tell me to add you to the spreadsheet editors list)

So.. we're ready with the container, now we have to put in the updated data. I plan to show PMPML staff how to update this themselves, but right now there's a large quantity of data that needs filling in, and I was wondering if anybody would like to help out? We have received the latest data from PMPML (in the original human-readable but machine unreadable form). This is what it looks like originally:

I'm converting the word file to excel and then the fonts to unicode marathi... here it is:

Of course, there would obviously be nothing better than a full-fledged high-tech routes management system for this.

I don't have the high-tech skills to make anything like that, but I was able to turn PMPML's verbally stated requirements into a technical design document with illustrations of what we need. You can see and download the design document here: (in the hopes that we might be able to find people who can help make this a reality).

So the google spreadsheet is a low-tech substitute to at least get working with what we have till we get to such a system. Because it is standardized, follows a clear logical order and doesn't have random blank lines, formatting etc, a programmer could convert the data there to something he/she is comfortable working with, like XML, JSON, or if they get it right, even GTFS feed. I was able to use some excel formulae to generate a KML file of the routes of one depot. So this is essentially a middle-step that would reformat PMPML's data to a machine-readable plus human-readable and editable form that an app developer can then make use of. (Just FYI, GTFS format is NOT human-readable, so please don't expect anybody to go manually editing 6 lakh line long GTFS feeds, that's never going to happen. See an analysis here. GTFS would be the output of a good routes management system. Lekin system kya apna chacha banaayega? :P)

I also have a clear goal to prioritize openness of data and ownership of PMPML (and by extension, commuters of Pune and Pimpri-Chinchwad) over the data : they should be able to edit it when they need to and the data should not get divorced away from them like it has happened the last few times. Keeping things as simple and usable as possible is priority, so if you can suggest ways to do more of that then it would be great.

So inviting your involvement with this, in whichever form is most comfortable for you. You could even do this as an internship (we'll figure out certification formalities, don't worry) You could edit existing routes, fill in missing routes or stops, point out apocalypse-causing errors, get to work on making a sexy routes management system,  go to PMPML and train the staff there to populate this, etc etc. If you volunteer to edit information for a depot, I'll add you to the spreadsheet and selectively unlock that depot's sheet for you to edit. Or just share this email with groups who you think might be able to help. Please feel free to write back to me on nikhil.js [at] if you have any queries.

PS: Just to clarify, I'm not involved with the website of PMPML. I only volunteered to help with the bus routes data when there was a need to convert data in Shree-Dev font to Unicode Marathi.

