Search code examples
pythonscrapyscrapy-pipeline

Python scrapy change csv cloumn name


Scrapy is indeed powerful. But it lakes a basic feature. When exporting to csv there is not way to change column name. By default it outputs the field name defined in Item. The item name should be python variables. However, at times we need to output with human readable cloumn names like person_name to Person Name

Is there any solution/setting exists. I tried with FEED in the settings but it simply takes which field to output.

Current Output:

id,person_name,uniq_code
D32,John Smith,8923
D89,Sleim,2343

Required Output:

ID,Person Name,Person Code
D32,John Smith,8923
D89,Sleim,2343

Solution

  • When using items, the field names have to adhere to python variable names syntax, but you can define the fields as well by overwriting the fields list in the constructor of the item class as below i.e. this will allow you to define field names containing characters such as spaces, full stops etc. which are not valid python variable names.

    import scrapy
    
    # define the item class
    class MyItem(scrapy.Item):
        def __init__(self):
            super().__init__()
            self.fields["ID"] = scrapy.Field()
            self.fields["Person Name"] = scrapy.Field()
            self.fields["Person Code"] = scrapy.Field()
    

    In your spider callback function, you can then populate the item as below and the column names will be as required.

    def parse(self, response):
        item = MyItem()
        item['ID'] = 'D32'
        item['Person Name'] = 'John Doe'
        item['Person Code'] = 8923
        yield item