To make things easier but also more complicated, I tried to implement a concept of "combined / concise tags" that expand further on into multiple basic tag forms.
In this case the tags consist of (one or more) "sub-tag(s)", delimited by semicolons:
food:fruit:apple:sour/sweet
drink:coffee/tea:hot/cold
wall/bike:painted:red/blue
Slashes indicate "sub-tag" interchangeability. Therefore the interpreter translates them to this:
food:fruit:apple:sour
food:fruit:apple:sweet
drink:coffee:hot
drink:coffee:cold
drink:tea:hot
drink:tea:cold
wall:painted:red
wall:painted:blue
bike:painted:red
bike:painted:blue
The code used (not perfect, but works):
import itertools
def slash_split_tag(tag):
if not '/' in tag:
return tag
subtags = tag.split(':')
pattern, v_pattern = (), ()
for subtag in subtags:
if '/' in subtag:
pattern += (None,)
v_pattern += (tuple(subtag.split('/')),)
else:
pattern += (subtag,)
def merge_pattern_and_product(pattern, product):
ret = list(pattern)
for e in product:
ret[ret.index(None)] = e
return ret
CartesianProduct = tuple(itertools.product(*v_pattern)) # http://stackoverflow.com/a/170248
return [ ':'.join(merge_pattern_and_product(pattern, product)) for product in CartesianProduct ]
#===============================================================================
# T E S T
#===============================================================================
for tag in slash_split_tag('drink:coffee/tea:hot/cold'):
print tag
print
for tag in slash_split_tag('A1/A2:B1/B2/B3:C1/C2:D1/D2/D3/D4/EE'):
print tag
Question: How can I possibly revert this process? I need this for readability reasons.
Here's a simple, first-pass attempt at such a function:
def compress_list(alist):
"""Compress a list of colon-separated strings into a more compact
representation.
"""
components = [ss.split(':') for ss in alist]
# Check that every string in the supplied list has the same number of tags
tag_counts = [len(cc) for cc in components]
if len(set(tag_counts)) != 1:
raise ValueError("Not all of the strings have the same number of tags")
# For each component, gather a list of all the applicable tags. The set
# at index k of tag_possibilities is all the possibilities for the
# kth tag
tag_possibilities = list()
for tag_idx in range(tag_counts[0]):
tag_possibilities.append(set(cc[tag_idx] for cc in components))
# Now take the list of tags, and turn them into slash-separated strings
tag_possibilities_strs = ['/'.join(tt) for tt in tag_possibilities]
# Finally, stitch this together with colons
return ':'.join(tag_possibilities_strs)
Hopefully the comments are sufficient in explaining how it works. A few caveats, however:
It doesn't do anything sensible such as escaping backslashes if it finds them in the list of tags.
This doesn't recognise if there's a more subtle division going on, or if it gets an incomplete list of tags. Consider this example:
fish:cheese:red
chips:cheese:red
fish:chalk:red
It won't realise that only cheese
has both fish
and chips
, and will instead collapse this to fish/chips:cheese/chalk:red
.
The order of the tags in the finished string is random (or at least, I don't think it has anything to do with the order of the strings in the given list). You could sort tt
before you join it with slashes if that's important.
Testing with the three lists given in the question seems to work, although as I said, the order may be different to the initial strings:
food:fruit:apple:sweet/sour
drink:tea/coffee:hot/cold
wall/bike:painted:blue/red