I am trying to see if a movie is the same between two pages, and to do so I would like to compare the Actors as one of the criteria. However, actors are often listed differently on different pages. For example:
On this page, https://play.google.com/store/movies/details?id=cSdcb2KOH74, the actors are listed as "Mikhail Galustyan, Danny Trejo, Guillermo Díaz, Oleg Taktarov, Kym Whitley, Christopher Robin Miller, Robert Bear, Vladimir Yaglych, Josh McLerran"
One this page, http://www.imdb.com/title/tt2167970/, the actors as "Ivan Stebunov, Ingrid Olerinskaya, Vladimir Yaglych"
Previously, I was doing a very rough match on:
if actors_from_site_1[0] == actors_from_site_2[0]
But, as you can see from the above case, this isn't a good technique. What would be a better technique to see if the actors from one film match the others?
You could check the length of a set intersection of the two sets of actors.
if len(set(actors_from_site_1).intersection(set(actors_from_site_2))):
or you could do something like:
if any(actor in actors_from_site_1 for actor in actors_from_site_2):