I'm familiar with comparing 2 lists of integers and string; however, when comparing 2 lists of strings with extra characters involved can be a little challenging.
Assume the output contains the following where I break it into a list of string. I called it diff in my code.
Output
164c164
< Apples =
---
> Apples = 0
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Lemons = 2
< Strawberries = 4
---
> Lemons = 4
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
The second set of string contains the ignore variable where I wanted the first list to be compared against.
>>> ignore
['Apples', 'Lemons']
My code:
>>> def str_compare (ignore, output):
... flag = 0
... diff = output.strip ().split ('\n')
... if ignore:
... for line in diff:
... for i in ignore:
... if i in line:
... flag = 1
... if flag:
... flag = 0
... else:
... print (line)
...
>>>
The code works with Apple and Lemons omitted.
>>> str_compare(ignore, output)
164c164
---
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
>>>
There must be a better way to compare 2 strings that it's not O(n^2). Had my diff list doesn't contain extra characters like "Apples =" then comparing the two lists can be achieved with O(n). Any suggestions or ideas to compare without looping through the "ignore" variable on every diff element?
Update #1 To avoid confusion and using the suggested comment, I've updated the code.
>>> def str_compare (ignore, output):
... diff = output.strip ().split ('\n')
... if ignore:
... for line in diff:
... if not any ([i in line for i in ignore]):
... print (line)
... print ("---")
>>>
Regardless, it still loop through ignore twice for every diff element.
for efficiency use ignore sets not list. Use split to get the key word fromline.
>>> def str_compare (ignore, output):
... ignore = set (ignore)
... diff = output.strip ().split ('\n')
... for line in diff:
... if line.startswith('<') or line.startswith('>'):
... var = line.split () [1]
... if var not in ignore:
... print (line)
... else:
... print (line)
...
Output
>>> str_compare (ignore, output)
164c164
---
168c168
< Berries =
---
> Berries = false
218c218
< Cherries =
---
> Cherries = 20
223c223
< Bananas =
---
> Bananas = 10
233,234c233,234
< Strawberries = 4
---
> Strawberries = 2
264c264
< Watermelons =
---
> Watermelons = 524288
You can eliminate need for flag by splitting and joing over "---\n" (slightly more general solution than flags or typin ----)
Note that string inclusion s1 in s2 worst case should be about len(s1) * len(2), while equality about max(len(s1),len(s2). While python implementation is pretty decent (for average case), linear complexity algos seem to exist http://monge.univ-mlv.fr/~mac/Articles-PDF/CP-1991-jacm.pdf See also Algorithm to find multiple string matches