I am using Python and BeautifulSoup to extract financial statement information from cities financial reports for my professor. I have gotten the output to the list of lists below. I need to be able to write out the values into one row on the CSV file so they can be analyzed in SAS. I will run my program on over 750 financial statements so I need the code to be rather robust. Ideally, I would like the output to look like something below:
Output example
(Row 1) '', 'Governmental ASSETS Current Assets Cash and cash equivalents'
(Row 2) ' June 30 2002', '405057'
"ASSETS" would ideally be in every column heading, "Current assets" would change to "Noncurrent assets" when python loops through it Unrestricted would change for each row heading. "Governmental" would be in the column heading for all values in the first column, "Business-Type" in the second, and "Total" for third.
An example page would be http://www.osc.ct.gov/2002cafr/financial/basic/netassets.asp
I thought 'enumerate' would work for it appears to only be useful for lists not list of lists. I know I could specify it in my code for every possible scenario such as if text == "ASSETS" but that doesn't seem very python. I'm thinking there has to be a way to tell the program to keep using one of the column headings until a new section comes along. Example
'cash and cash equivalents' should be used until the program reaches 'Recievables' then 'Receivables' should replace it. 'Governmental' should be used until the bottom of the first column of values then 'Business-Type' should be used.
Any help you can give me on this matter would be greatly appreciated.
List of Lists
[[u'Statement of Net Assets']
[u'June 30 2002', '', '', '', '']
[u'-Expressed in Thousands', '', '', '', '']
[u'Primary Government']
[u'Governmental', u'Business-Type', '', u'Component']
[u'Activities', u'Activities', u'Total', u'Units']
[u'Assets']
[u'Current Assets:', u' ', u' ', u' ', u' ']
[u'Cash and Cash Equivalents', u'$405057', u'$486600', u'$891657', u'$536609']
[u'Deposits with U.S. Treasury', u'0', u'675562', u'675562', u'0']
[u'Investments', u'181405', u'250670', u'432075', u'120078']
[u'Receivables -Net of Allowances', u'1841932', u'450954', u'2292886', u'165888']
[u'Due From Component Units', u'0', u'99611', u'99611', u'0']
[u'Due From Primary Government', u'0', u'0', u'0', u'20346']
[u'Inventories', u'61130', u'10814', u'71944', u'3543']
[u'Restricted Assets', u'0', u'9420', u'9420', u'451057']
[u'Internal Balances', u'-145078', u'145078', u'0', u'0']
[u'Other Current Assets', u'13821', u'8910', u'22731', u'12353']
[u'Total Current Assets', u'2358267', u'2137619', u'4495886', u'1309874']
[u'Noncurrent Assets:', u' ', u' ', u' ', u' ']
[u'Cash and Cash Equivalents', u'0', u'63073', u'63073', u'0']
[u'Restricted Assets', u'590374', u'695704', u'1286078', u'425372']
[u'Investments', u'0', u'448063', u'448063', u'234383']
[u'Loans -Net of Allowances', u'406272', u'505043', u'911315', u'3068708']
[u'Capital Assets -Net of Accumulated Depreciation', u'9125804', u'2306065',
u'11431869', u'252286']
[u'Other Noncurrent Assets', u'14388', u'81532', u'95920', u'59925']
[u'Total Noncurrent Assets', u'10136838', u'4099480', u'14236318', u'4040674']
[u'Total Assets', u'12495105', u'6237099', u'18732204', u'5350548']
[u'Liabilities', u' ', u' ', u' ', u' ']
[u'Current Liabilities:', u' ', u' ', u' ', u' ']
[u'Accounts Payable and Accrued Liabilities', u'710270', u'194520', u'904790',
u'52986']
[u'Due To Component Units', u'20346', u'0', u'20346', u'0']
[u'Due To Primary Government', u'0', u'0', u'0', u'99611']
[u'Escrow Deposits', u'0', u'0', u'0', u'26347']
[u'Current Portion of Long-Term Obligations', u'976958', u'168936', u'1145894',
u'118451']
[u'Amount Held for Institutions', u'0', u'0', u'0', u'279817']
[u'Deferred Revenue', u'39159', u'59335', u'98494', u'680']
[u'Medicaid Liability', u'577150', u'0', u'577150', u'0']
[u'Liability for Escheated Property', u'51178', u'0', u'51178', u'0']
[u'Other Current Liabilities', u'160333', u'65563', u'225896', u'18271']
[u'Total Current Liabilities', u'2535394', u'488354', u'3023748', u'596163']
[u'Noncurrent Liabilities:', u' ', u' ', u' ', u' ']
[u'Non-Current Portion of Long-Term Obligations', u'14576670', u'1948712', u'16525382',
u'3667265']
[u'Total Noncurrent Liabilities', u'14576670', u'1948712', u'16525382', u'3667265']
[u'Total Liabilities', u'17112064', u'2437066', u'19549130', u'4263428']
[u'Net Assets', u' ', u' ', u' ', u' ']
[u'Invested in Capital Assets Net of Related Debt', u'2348364', u'1847526', u'4195890',
u'44126']
[u'Restricted For:', u' ', u' ', u' ', u' ']
[u'Transportation', u'169228', u'0', u'169228', u'0']
[u'Debt Service', u'553530', u'103933', u'657463', u'20229']
[u'Capital Projects', u'0', u'144982', u'144982', u'0']
[u'Unemployment Compensation', u'0', u'798703', u'798703', u'0']
[u'Clean Water Projects', u'0', u'402281', u'402281', u'0']
[u'Bond Indenture Requirements', u'0', u'22425', u'22425', u'609058']
[u'Other Purposes', u'419135', u'196465', u'615600', u'27817']
[u'Funds Held as Permanent Investments:', u' ', u' ', u' ', u' ']
[u'Expendable', u'5924', u'0', u'5924', u'0']
[u'Nonexpendable', u'83598', u'177343', u'260941', u'0']
[u'Unrestricted -Deficit', u'-8196738', u'106375', u'-8090363', u'385890']
[u'Total Net Assets -Deficit', u'$-4616959', u'$3800033', u'$-816926', u'$1087120']]
Hi you can acces the info as showen below :
import csv
description = table_timeline_inner['user']['description']
writer = csv.writer(open('stocks.csv', 'a', buffering=0))
writer.writerows([(description, hashtags])
for tables in tables use [][] and you can write it at a csv file using csv.write - the 'a' makes it not over write anything