Search code examples
pythonstringliststartswith

How to select elements of lists in a list group, if the elements(string) startswith a letter/number?


Here I want to select the elements in each list which meet the condition that they starts with '6'. However I didn't find the way to achieve it.

The lists are converted from a dataframe:

d = {'c1': ['64774', '60240', '60500', '19303', '38724', '11402'], 
     'c2': ['', '95868', '95867', '60271', '60502', '19125'],
     'c3':['','','','','95867','60500']} 
df= pd.DataFrame(data=d)
df
  c1     c2     c3
64774   
60240   95868
60500   95867
19303   60271
38724   60502   95867
11402   19125   60500
list = df.values.tolist()
list = str(list)
list

[['64774', '', ''],
 ['60240', '95868', ''],
 ['60500', '95867', ''],
 ['19303', '60271', ''],
 ['38724', '60502', '95867'],
 ['11402', '19125', '60500']]

I tried the code like:

[x for x in list if x.startswith('6')]

However it only returned '6' for elements meet the condition

['6', '6', '6', '6', '6', '6', '6', '6', '6']

What I'm looking for is a group of lists like:

"[['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']]"

Solution

  • When you do list = str(list) you're converting your list to a string representation, i.e. list becomes

    "[['64774', '', ''], ['60240', '95868', ''], ['60500', '95867', ''], ['19303', '60271', ''], ['38724', '60502', '95867'], ['11402', '19125', '60500']]"
    

    You then loop through the string with the list comprehension

    [x for x in list if x.startswith('6')]
    

    Which produces each individual character in the string which means you just find all occurrences of 6 in the string, hence your result of

    ['6', '6', '6', '6', '6', '6', '6', '6', '6']
    

    Sidenote: Don't use variable names that shadow builtin functions, like list, dict and so on, it will almost definitely cause issues down the line.

    I'm not sure if there is any specific reason to use a dataframe/pandas for your question. If not, you could simply use a list comprehension

    d = {
      'c1': ['64774', '60240', '60500', '19303', '38724', '11402'], 
      'c2': ['', '95868', '95867', '60271', '60502', '19125'],
      'c3':['','','','','95867','60500']
    }
    
    d2 = [[x] for v in d.values() for x in v if x.startswith('6')]
    # d2: [['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']]