I am trying to get all the values from Rows into Columns. I don't have an Index, so find it hard to have all in one column.
Code: getting the values
traceFilters = sqlContext.read.format("csv").options(header='true', delimiter = ',').load("/data/*.txt")
traceFilters.take(5)
fields = [
StructField("City", StringType(), False),
StructField("Country", StringType(), False)
]
traceFilters.track(5)
for row in traceFilters.rdd.collect():
a = row.City
print a
This is the data that i am getting from above code:
New York
London
Vienna
and the result that i want.
[ New York, London, Vienna ]
I tried using transpose
, but its not working and also with zip
.
Code that i tried:
print a.transpose()
or val1= a.set_index('City').T
Any help appreciated.
Thanks
It looks like you are just printing each value, but that you really want a list. This appends each value into a list, then prints it:
traceFilters = sqlContext.read.format("csv").options(header='true', delimiter = ',').load("/data/*.txt")
traceFilters.take(5)
fields = [
StructField("City", StringType(), False),
StructField("Country", StringType(), False)
]
traceFilters.track(5)
a = []
for row in traceFilters.rdd.collect():
a.append(row.City)
print(a)