I have used a semi-complex regex to retrieve data from a website. The issue I have is that I have to do some post-processing of the matched dataset.
I have gotten the data processes to probably 95+% of where I want it, however, I am getting this simple error message that I cannot reason about; it's strange.
I can bypass it, but that is besides the point. I am trying to figure out if this is a bug or something I am overlooking fundementally with my tuple-unpacking
One thing I have to overcome is that I get 4 matches for every "true match". That means that my data for 1 single item is spread out over 4 matches.
In simple graphical form (slighty oversimplified):
index | a b c d e f g h i j
--------------------------------------------------------
1: | ( ), ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( )
2: | (█), (█), (█), (█), ( ), ( ), ( ), ( ), ( ), ( )
3: | ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( ), ( )
4: | ( ), ( ), ( ), ( ), ( ), ( ), (█), (█), (█), (█)
5: | ( ), ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( )
6: | (▒), (▒), (▒), (▒), ( ), ( ), ( ), ( ), ( ), ( )
7: | ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( ), ( )
8: | ( ), ( ), ( ), ( ), ( ), ( ), (▒), (▒), (▒), (▒)
9: | ...
...
615: | ...
I can get all the data, but I want to compact it, like so...
index | a b c d e f g h i j
--------------------------------------------------------
1: | (█), (█), (█), (█), (█), (█), (█), (█), (█), (█)
2: | (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒)
3: | ...
...
154: | ...
Take note of the varibles abcd
, e
, f
, and ghij
and how I have to unpack them in the for-loop
at the bottom
matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]
f = [
f
for index, (_, _, _, _, _, f, *_)
in enumerate(matches)
if index % 4 == 0
]
abcd = [
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
]
e = [
e
for index, (_, _, _, _, e, *_)
in enumerate(matches)
if index % 4 == 2
]
ghij = [
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3
]
abcdefghij = zip(abcd, e, f, ghij)
for (a, b, c, d), e, f, (g, h, i, j) in abcdefghij:
print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)
#
Take note that I am trying to unpack the same tuples right away with the varibles a
, b
, c
, d
, e
, f
, g
, h
, i
, and j
matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]
f = [
f
if f == "stable" else "preview"
for index, (_, _, _, _, _, f, *_)
in enumerate(matches)
if index % 4 == 0
]
a, b, c, d = [
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
]
e = [
e
for index, (_, _, _, _, e, *_)
in enumerate(matches)
if index % 4 == 2
]
g, h, i, j = [
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3]
abcdefghij = zip(a, b, c, d, e, f, g, h, i, j)
for a, b, c, d, e, f, g, h, i, j in abcdefghij:
print("a", a, "\nb", b, "\nc", c, "\nd", d, "\ne", e, "\nf", f, "\ng", g, "\nh", h, "\ni", i, "\nj", j, "\n", "-" * 100)
#
With this code, I get the following error message...
... a, b, c, d = [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]` ValueError: too many values to unpack (expected 4)`
I would have expected these two methods to do the exact same logic and the end results should be exactly the same.
They are not! Why?
@PaulPanzer That appears to work. I will have to verify that everything lines up correctly. But why do I need that?
Say q
is an iterable for which (?) your comprehension produces a list with 26 tuples, and each tuple has 4 items.
z = [(a,b,c,d) for i, (a,b,c,d,*e) in enumerate(q)]
In [6]: len(z)
Out[6]: 26
In [7]: len(z[0])
Out[7]: 4
In [17]: z[:3]
Out[17]: [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]
When you try to unpack you are trying to stuff 26 items into four names/variables
In [8]: a,b,c,d = z
Traceback (most recent call last):
File "<ipython-input-8-64277b78f273>", line 1, in <module>
a,b,c,d = z
ValueError: too many values to unpack (expected 4)
zip(*list_of_4_item_tuples)
will transpose the list_of_4_item_tuples
to 4 tuples with 26 items each
In [9]:
In [9]: a,b,c,d = zip(*z) # z is the result of the list comprehension shown above
In [11]: len(a),len(b),len(c),len(d)
Out[11]: (26, 26, 26, 26)
Test stuff
import string
a = string.ascii_lowercase
b = string.ascii_lowercase
c = string.ascii_lowercase
d = string.ascii_lowercase
e = string.ascii_lowercase
f = string.ascii_lowercase
q = zip (a,b,c,d,e,f)