I am trying to create a table that consists of objects in a particular region. I am putting together irac, 2mass, and wise data. I have combined all of the data into one table, and am now trying to eliminate duplicates based on RA and Dec coordinates. My end goal is to have a complete table without any duplicates. So if an object is in the irac data, there isn't an equivalent from the 2mass or wise data. I am new to using python. It would be nice to get the row # of the duplicates.
import pandas as pd
df = pd.read_csv('filename.csv')
duplicate = []
for num in df['ra' and 'dec']:
if any(df['ra' and 'dec'].duplicated()):
dublicate.append(num)
This has been running for a while, I just not sure if it is correct or working efficiently. The ra and dec values are in two separate columns.
The task you are trying to do is catalog cross-matching of sky coordinates. The pandas methods methods like drop_duplicates
are not appropriate because they use exact numerical comparisons, but in general the RA and Dec values in catalogs will be different by values related to the catalog accuracies.
Efficient catalog cross-matching is a big subject by itself, but you can get started with these references: