Given n
strings S1, S2, ..., Sn
, and an alphabet set A={a_1,a_2,....,a_m}
. Assume that the alphabets in each string are all distinct. Now I want to create an inverted-index for each a_i (i=1,2...,m)
. My inverted-index has also something special: The alphabets in A
are in some sequential order, if in the inverted-index a_i has included one string (say S_2
), then a_j (j=i+1,i+2,...,m)
don't need to include S_2
any more. In short, every string just appears in the inverted list only once. My question is how to build such list in a fast and efficient way? Any time complexity is bounded?
For example, A={a,b,e,g}, S1={abg}, S2={bg}, S3={gae}, S4={g}
. Then my inverted-list should be:
a: S1,S3
b: S2 (since S1 has appeared previously, so we don't need to include it here)
e:
g: S4
If I understand your question correctly, a straightforward solution is:
for each string in n strings
find the "smallest" character in the string
put the string in the list for the character
The complexity is proportional to the total length of the strings, multiplying by a constant for the order testing.
If there is a simple way for testing, (e.g. the characters are in alphabetical order and all lower-case, a < will be enough), simply compare them; otherwise, I suggest using a hash table, each pair of which is a character and its order, later simply compare them.