Search code examples
pythonpandaslistdatasetrepeat

How to avoid repetition into list while building dataset


I am trying to create the following dataset:

multiple_newbooks = {"Books'Tiltle":["American Tabloid", 'Libri che mi hanno rovinato la vita ed Altri amori malinconici', '1984' ],
                         'Authors':['James Ellroy', 'Daria Bignardi', 'George Orwell'],
                         'Publisher': [('Mondadori' for i in range(0,2)), 'Feltrinelli'], 
                         'Publishing Year':[1995, 2022, 1994], 
                         'Start': ['?', '?', '?'], 
                         'Finish': ['?', '?', '?']}

As you could some data present some repetitions. I would just avoid using the .append function outside the data frame I am creating for the 'Publisher' row (since the code you see here does not work) or to avoid the following sequence of equal data:

'Start': ['?', '?', '?'], 
'Finish': ['?', '?', '?']

Could you possibly know how to use alternative elegant and smart code? Thanks for your suggestions.


Solution

  • If I understand you correctly, you don't want to repeat writing the strings. You can use for example * to repeat the string:

    multiple_newbooks = {
        "Books'Tiltle": [
            "American Tabloid",
            "Libri che mi hanno rovinato la vita ed Altri amori malinconici",
            "1984",
        ],
        "Authors": ["James Ellroy", "Daria Bignardi", "George Orwell"],
        "Publisher": ["Mondadori"] * 2 + ["Feltrinelli"],
        "Publishing Year": [1995, 2022, 1994],
        "Start": ["?"] * 3,
        "Finish": ["?"] * 3,
    }
    
    print(multiple_newbooks)
    

    Prints:

    {
        "Books'Tiltle": [
            "American Tabloid",
            "Libri che mi hanno rovinato la vita ed Altri amori malinconici",
            "1984",
        ],
        "Authors": ["James Ellroy", "Daria Bignardi", "George Orwell"],
        "Publisher": ["Mondadori", "Mondadori", "Feltrinelli"],
        "Publishing Year": [1995, 2022, 1994],
        "Start": ["?", "?", "?"],
        "Finish": ["?", "?", "?"],
    }
    

    Or better:

    multiple_newbooks = {
        "Books'Tiltle": [
            "American Tabloid",
            "Libri che mi hanno rovinato la vita ed Altri amori malinconici",
            "1984",
        ],
        "Authors": ["James Ellroy", "Daria Bignardi", "George Orwell"],
        "Publisher": ["Mondadori" for _ in range(2)] + ["Feltrinelli"],
        "Publishing Year": [1995, 2022, 1994],
        "Start": ["?" for _ in range(3)],
        "Finish": ["?" for _ in range(3)],
    }