I would like to save my dataframe in a way that matches an existing txt file (I have a trained model based on the this txt file and I now want to predict on new data, that needs to match this format).
The target txt file looks like this (first3 rows):
2 qid:0 0:0.4967141530112327 1:-0.1382643011711847 2:0.6476885381006925 3:1.523029856408025 4:-0.234153374723336
1 qid:2 0:1.465648768921554 1:-0.2257763004865357 2:0.06752820468792384 3:-1.424748186213457 4:-0.5443827245251827
2 qid:0 0:0.7384665799954104 1:0.1713682811899705 2:-0.1156482823882405 3:-0.3011036955892888 4:-1.478521990367427
First column is just a random integer (here the 2 and the 1) The qid is always connected via colon to an integer. Then an integer is followed by a float, for the rest of the columns.
My dataframe looks like this:
data = {'label': [2,3,2],
'qid': ['qid:0', 'qid:1','qid:0'],
'0': [0.4967, 0.4967,0.4967],
'1': [0.4967, 0.4967,0.4967],
'2': [0.4967, 0.4967,0.4967],
'3': [0.4967, 0.4967,0.4967],
'4': [0.4967, 0.4967,0.4967]}
df = pd.DataFrame(data)
try this and let us know if it works for you case
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]
Updated code very messy if this solves your problem then it can be updated to handle large amount of data using numpy array methods
for i in list(data.keys()):
if i=="label" or i=="qid":
pass
else:
data[i]=[str(i)+":"+str(j) for j in list(data[i])]