Finding Duplicated Values in Pandas Groupby Object

I have a Pandas DataFrame:

msg_id	identifier
001	Stackoverflow
001	Stackoverflow
002	Stackoverflow
002	Cross-Validated

I want to drop the duplicated values in identifier for each unique value of msg_id

This is my current apporach which is super slow:

acc_df = pd.DataFrame(columns = df.columns)
for _, group in df.groupby("msg_id"):
    df = group[group.duplicated("identifier")]
    if len(df) > 0:
        acc_df = pd.concat([df, acc_df], axis=0, ignore_index=False)
acc_df

I have a very large dataset with 500 million rows. Even after filtering for the msg_id that has more than one identifier comes at the very large number.

I am looking for any vectorized or faster apporach NOT INCLUDING Multi-Processing and Threading

Solution

Code

The problem is to find rows where the values of two columns are duplicated, not grouped. This is possible as follows.

df[df.duplicated(['msg_id', 'identifier'])]

Get the errorbars to bins of a dataset by using bootstrapping
How to tune PID controller on non-linear model (here: diving of a body)
django import error - No module named core.management
Changing tense of text from present/future to past tense
Discord.py modal form command has no response despite no errors in console
AttributeError: module 'pyperclip' has no attribute 'waitForPaste'
Array lists of HTML elements in order by website in selenium / beautiful soup
Recursive types in Python and difficulties inferring the type of `type(x)(...)`
`AttributeError` constructing `torchvision.io.VideoReader` in Google Colab
How to use python selenium to click this element with text bb1
Base 62 conversion
How do I get the object if it exists, or None if it does not exist in Django?
Sampling n= 2000 from a Dask Dataframe of len 18000 generates error Cannot take a larger sample than population when 'replace=False'
Set same scale in legend matplotlib
Replicate virtualenv without downloading all the packages again on the same machine
Tkinter entry widget input are written backwards
Pydantic v2 vs requests POST JSON - 'not serializable' nightmares
Unexpected generator behaviour when not assigned to a variable
How to change the cursor in Pygame to a custom image
Azure functions in Python - "Duplex option is required" error
model not showing up in django admin
How can I see the entire HTTP request that's being sent by my Python application?
Appending to a list by adding to the previous value in Python
Layer 'conv2d_11' expected 2 variables, but received 0 variables during loading. Expected: ['conv2d_11/kernel:0', 'conv2d_11/bias:0']
The Python dbus package is not installed
How to determine pid of process started via os.system
Can I store a Parquet file with a dictionary column having mixed types in their values?
I cannot import monsoon after installing Monsoon
PyCharm: Why "Python Console" is not accessing ~\.aws\credentials file? How to set it within "Python Console"
Unexpected Behavior of pd.Grouper with datetime Key and freq Argument