Search code examples
pythonpandasstringnumpyseries

slicing strings in Series by a different Series of Ints


Say we have this dict as a dataframe with two columns:

data = {
  "slice_by" : [2, 2, 1]
  "string_to_slice" : ["one", "two", "three"]
}

First line works just fine, second one doesn't:

df["string_to_slice"].str[:1])
df["string_to_slice"].str[:df["slice_by"])

Output:

0        ne
1        wo
2        hree
Name: string_to_slice, Length: 3, dtype: object
0       NaN
1       NaN
2       NaN
Name: string_to_slice, Length: 3, dtype: float64

What would be the appropiate way to do this? I'm sure I could make up something with df.iterrows() but that's probably not the efficient way.


Solution

  • I am assuming you want str[slice_by:] and not str[:slice_by]. With that assumption you can do:

    np_slice_string = np.vectorize(lambda x, y: x[y:]))
    out = np_slice_string(df['string_to_slice'], df['slice_by'])
    

    print(out):

    ['e' 'o' 'hree']