Search code examples
androidhtml-parsinggtfs

HTML parser to create GTFS formatted data


There is a transit agency, who doesn't provide GTFS formatted transit schedule data. I would like to make an android application, that can search in it, so this format would be very useful. Transit schedule data has a website, but it seems hard to separate the useful things.

<td class="b stopPoint p0" background="nline.gif"><a href="line.cgi?id=1&dir=back&zero=15901&city=so&term=20141214"><img src="coming.gif" class="stopPoint" alt="A megállóhoz tartozó indulási időpontok megjelenítéséhez kérem, kattintson ide!" /></a></td>
<td class="b stopTime p0">2</td>
<td class="b stopPeakTime p0">2</td>
<td class="b stopName p0" colspan="1">Frankenburg úti aluljáró</td>
<td class="b stopTransfer p0"><img src="transfer.gif" class="iconTransfer" alt="Átszállási lehetőség a felsorolt autóbuszvonalakra" />&nbsp;&nbsp;<a href="line.cgi?id=10&dir=to&zero=1590&city=so&term=20141214">10</a>, <a href="line.cgi?id=10Y&dir=to&zero=1590&city=so&term=20141214">10Y</a></td>

Maybe an existing parser for this purpose would be helpful. Are there working ones?


Solution

  • Ask the transit agency if there is any way they can provide the schedule data in a more meaningful format. They might have some other data format that would be better than what they currently have.

    Otherwise, you'll probably have to write a custom scraper/parser for this. I like parsing html using python's beautifulsoup library, but there any number of ways to do this.