I am working on a geodataframe (pandas) with information on social media users and their home municipality. The column with their home municipality has 524 municipalities. 453 of them have two municipalities in the form of a linestring; 'City1 or City2'
two_cities = [s for s in gdf['home_municipality'] if " or " in s]
print(two_cities)
So far I have created a list with the above code where only the values with "or" are included. My question is; how can I run a 50/50 probability on the list items randomly choosing one municipality to assign to each respective user?
Here is a snippet of the list items:
['Vaasa or Mustasaari', 'Helsinki or Espoo', 'Vantaa or Turku', 'Helsinki or Espoo', 'Paimio or Turku', 'Turku or Helsinki', 'Helsinki or Espoo']
Just taking one of your strings, 'Vaasa or Mustasaari'
, as an example, you can convert this to a list of the two cities, then use random.randit
to randomly select an integer between 0 and 1 and use this as the index for the city to take from the list. Since only two integers are possible, this equates to a 50/50 chance.
import random
city_string = 'Vaasa or Mustasaari'
cities = city_string.split(' or ')
user_city = cities[random.randint(0, 1)]
print(user_city)
Outputs:
>>> user_city = cities[random.randint(0, 1)]
>>> print(user_city)
Mustasaari
>>> user_city = cities[random.randint(0, 1)]
>>> print(user_city)
Vaasa