I sincerely apologize if this isn't the proper forum to discuss this, but I wasn't sure where to go or what would be the best option.
Basically, I'm trying to find a database friendly list of veteran affairs hospitals. The closest thing that I've been able to find is www.va.gov/ofcadmin/docs/CATB.pdf as it has all the information I'm looking for:
I've tried exporting that PDF out into CSV but it's a complete nightmare to get working. So, I was curious if anyone had any ideas or insights into how I could accomplish this task.
First, here's a CSV file containing the data found in CATB.pdf. The very first line contains the column headers, and the rest of the file contains the contents.
http://tmp.alexloney.com/CATB.csv
Now, for the more detailed explanation...I took the PDF you provided a link to, converted it to an HTML document using Adobe Acrobat, then I used a lot of Regular Expressions to parse the file and clean it up. Once the file was cleaned up enough, I was able to write a program to parse through the remainder of the file, grab the state and region, and spit it all out in a nicely formatted CSV.
Hope that helps you!