For this data file that I'm working off, I am given pairs of lists where each element represents an age interval, but they're written as strings. For example,
List1 = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-']
List2 = ['0-19', '20-39', '40-']
List1 is used as a template to represent the age intervals for the corresponding data:
A1 = [30, 40, 50, 60, 70, 80]
B1 = [33, 20, 40, 76, 777, 844]
So, for example, the second element of A1 means the value is 40 for the age interval '10-19', the fifth element of B1 means the value is 777 for the interval '40-49'.
It is possible, because of the matching time intervals in List1 with List2, to sum the elements in A1 and B1 so that they now represent the time interval of List2.
A2 = [70, 110, 150]
B2 = [53, 116, 1621]
So now, for example, the second element of A2 (previously A1) represents the value 110 for the age interval '20-39' and the first element of B2 (previously B2) represents 53 for the interval '0-19'.
The data for List1 has been rebinned to match List2's age intervals. This is possible because of the overlapping age intervals. This cannot be done for data representing the following two age intervals:
List3 = ['0-14', '15-29', '30-44', '45-']
List4 = ['0-19', '20-39', '40-']
Because of the format of the data, I don't know how I can check if two lists has overlapping age intervals, which allows for data to be rebinned to represent a new set of age intervals. If anyone could point to me a method or library available in python that is capable of making such a task possible, specifically dealing with number intervals represented as strings, it would be much appreciated. Thank you.
You can sort out all younger ages in one set and all older ages in an other set. Then see if all older ages of the shorter list exists in older from the longer list and the same for younger. This way it will match not only pairs but any combination of singels, pairs, triplets etc.
def can_represent(short_list, long_list):
youngs_1 = {s.split('-')[0] for s in short_list}
youngs_2 = {s.split('-')[0] for s in long_list}
olds_1 = {s.split('-')[1] for s in short_list}
olds_2 = {s.split('-')[1] for s in long_list}
return not youngs_1 - youngs_2 and not olds_1 - olds_2