Search code examples
export-to-csvdata-extractiondata-exportpdf-extraction

Best way to get a database friendly list of Veteran Affairs Hospital


I sincerely apologize if this isn't the proper forum to discuss this, but I wasn't sure where to go or what would be the best option.

Basically, I'm trying to find a database friendly list of veteran affairs hospitals. The closest thing that I've been able to find is www.va.gov/ofcadmin/docs/CATB.pdf as it has all the information I'm looking for:

  • Region
  • Address
  • City in a separate column
  • Zip Code in a separate column
  • State
  • Facility # (also known as StationID)
  • VISN
  • Symbol

I've tried exporting that PDF out into CSV but it's a complete nightmare to get working. So, I was curious if anyone had any ideas or insights into how I could accomplish this task.


Solution

  • First, here's a CSV file containing the data found in CATB.pdf. The very first line contains the column headers, and the rest of the file contains the contents.

    http://tmp.alexloney.com/CATB.csv

    Now, for the more detailed explanation...I took the PDF you provided a link to, converted it to an HTML document using Adobe Acrobat, then I used a lot of Regular Expressions to parse the file and clean it up. Once the file was cleaned up enough, I was able to write a program to parse through the remainder of the file, grab the state and region, and spit it all out in a nicely formatted CSV.

    Hope that helps you!