Search code examples
pythontupleslist-comprehensiongzip

how get the nth string in a list of tuples through list-comprehension?


I am trying to get 1st (0) and 2nd (1) strings in a tuple at a df.

df= {'col':[ "[('affect', 'the risks')]", "[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]"]}
df = pd.DataFrame(df)

such that, the expected output for item0 and item1 should be:

df={'item0': [ "'affect'",  "'have', 'breached','cease', 'suffer',' allow', 'damage' , 'require', 'incur', 'remediate', 'resolve'"]}

df={'item1': [ "'the risks'",  "'we', 'our systems','our computer', '', 'misappropriation', 'proprietary', 'us', 'us', 'us'"]}
df = pd.DataFrame(df)

I think we should use zip() function but I couldnot figure it out because I have a dataframe here.

Resources I went through: 1)https://docs.python.org/3/tutorial/datastructures.html#nested-list-comprehensions 2) Python - List comprehension list of tuple list to tuple list


Solution

  • If you actually have a list of tuples, like this:

    data = {
        "col": [
            ("affect", "the risks"),
            ("have", "we"),
            ("breached", "our systems"),
            ("cease", "our computer"),
            ("suffer", ""),
            ("allow", ""),
            ("damage", "misappropriation"),
            ("require", "proprietary"),
            ("incur", "us"),
            ("remediate", "us"),
            ("resolve", "us"),
        ],
    }
    

    Then extracting things the way you want is relatively easy:

    item0 = [x[0] for x in data["col"]]
    item1 = [x[1] for x in data["col"]]
    
    print("item0:", item0)
    print("item1:", item1)
    

    That gets us:

    item0: ['affect', 'have', 'breached', 'cease', 'suffer', 'allow', 'damage', 'require', 'incur', 'remediate', 'resolve']
    item1: ['the risks', 'we', 'our systems', 'our computer', '', '', 'misappropriation', 'proprietary', 'us', 'us', 'us']
    

    Unfortunately, you don't have a list of tuples, and it's not clear from your question if that's just a typo or if you have simply mis-described your data. When you write:

    df = {
        "col": [
            "[('affect', 'the risks')]",
            "[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]",
        ]
    }
    

    You have a list of two strings. The first is:

    >>> df['col'][0]
    "[('affect', 'the risks')]"
    

    And the second is:

    >>> df['col'][1]
    "[('have', 'we'), ('breached', 'our systems'), ('cease', 'our computer'), ('suffer', ''), ('allow', ''), ('damage', 'misappropriation'), ('require', 'proprietary'), ('incur', 'us'), ('remediate', 'us'), ('resolve', 'us')]"
    

    Processing these is going to be a little tricky. Things will be much easier if you can arrange for your data to be formatted as a list of tuples instead.