Search code examples
pandasloopsfor-loopdatasetsaving-data

Saving datasets created in a forloop to multiple files


I have URLs (for web scraping) and municipality name stored in this list:

muni = [("https://openbilanci.it/armonizzati/bilanci/filettino-comune-fr/entrate/dettaglio?year=2021&type=preventivo", "filettino"), ("https://openbilanci.it/armonizzati/bilanci/partanna-comune-tp/entrate/dettaglio?year=2021&type=preventivo","partanna"), ("https://openbilanci.it/armonizzati/bilanci/fragneto-labate-comune-bn/entrate/dettaglio?year=2021&type=preventivo", "fragneto-labate") ]

I am trying to create different datasets for different municipalities. For example, data scraped from the first URL would be: filettinodak.csv. I am using the following code right now:

import re
import json
import requests
import pandas as pd
import os
import random

os.chdir(r"/Users/aartimalik/Dropbox/data")

muni = [("https://openbilanci.it/armonizzati/bilanci/filettino-comune-fr/entrate/dettaglio?year=2021&type=preventivo", "filettino"), 
("https://openbilanci.it/armonizzati/bilanci/partanna-comune-tp/entrate/dettaglio?year=2021&type=preventivo","partanna"),
("https://openbilanci.it/armonizzati/bilanci/fragneto-labate-comune-bn/entrate/dettaglio?year=2021&type=preventivo", "fragneto-labate")
]

for m in muni[1]:
    URL = m
    r = requests.get(URL)
    p = re.compile("var bilancio_tree = (.*?);")
    data = p.search(r.text).group(1)
    
    data = json.loads(data)
    
    all_data = []
    
    for d in data:
        for v in d["values"]:
            for kk, vv in v.items():
                all_data.append([d["label"], "-", kk, vv.get("abs"), vv.get("pc")])
                
        for c in d["children"]:
            for v in c["values"]:
                for kk, vv in v.items():
                    all_data.append(
                        [d["label"], c["label"], kk, vv.get("abs"), vv.get("pc")]
                        )
                        
    df = pd.DataFrame(all_data, columns=["label 1", "label 2", "year", "abs", "pc"])
    
    df.to_csv(muni[2]+"dak.csv", index=False)

The error I am getting is: Traceback (most recent call last): File "<stdin>", line 19, in <module> TypeError: can only concatenate tuple (not "str") to tuple.

I think I am doing something wrong with the muni indexing: muni[i]. Any suggestions? Thank you so much!


Solution

  • If you adjust your for loop a bit, it should solve your problem. The below change loops through all list entries in muni. Each time, it extracts the first value from each tuple into URL and the second tuple value into label.

    for URL, label in muni:
    

    And with that change, the final line in your code can become:

    df.to_csv(label+"dak.csv", index=False)