I have a table with one key with duplicate values. I would like to drop/reduce all duplicate keys but preserve the first row of each duplicate.
let data = "A;B\na;1\nb;\nb;2\nc;3"
let bytes = System.Text.Encoding.UTF8.GetBytes data
let stream = new MemoryStream( bytes )
let df=
Frame.ReadCsv(
stream = stream,
separators = ";",
hasHeaders = true
)
df.Print()
A B
0 -> a 1
1 -> b <missing>
2 -> b 2
3 -> c 3
The result should be
A B
0 -> a 1
1 -> b <missing>
2 -> c 3
I have tried applyLevel
but I only get the value not the first entry:
let df1 =
df
|> Frame.groupRowsByString "A"
|> Frame.applyLevel fst (fun s -> s |> Series.firstValue)
df1.Print()
A B
a -> a 1
b -> b 2 <- wrong
c -> c 3
This is essentially a duplicate of a previous SO question. The short answer is:
let df1 =
df
|> Frame.groupRowsByString "A"
|> Frame.nest // convert to a series of frames
|> Series.mapValues (Frame.take 1) // take the first row from each frame
|> Frame.unnest // convert back to a single frame
|> Frame.mapRowKeys snd
df1.Print()
The output is:
A B
0 -> a 1
1 -> b <missing>
3 -> c 3
I've added a call to Frame.mapRowKeys
at the end to match your desired output as closely as possible. Note that the actual output differs slightly from your expected output, because row 3 -> c 3
has original index 3 instead of 2. I think this is more correct, but you can renumber the rows if necessary.
The referenced question has more details.