I have a lot of .mat files which contain the information about the radial part of some different wavefunctions and some other information about an atom. Now I successfully extracted the wavefunction part and using numpy.savetxt() to save it into .txt file. But the size of the file increases so much: After I ran
du -ch wfkt_X_rb87_n=40_L=11_J=0_step=0.001.mat
440K wfkt_X_rb87_n=40_L=11_J=0_step=0.001.mat
du -ch wfkt_X_rb87_n=40_L=12_J=0_step=0.001.txt
2,9M wfkt_X_rb87_n=40_L=12_J=0_step=0.001.txt
Ignore the L=11 and L=12 difference, the size of the wavefunctions are almost the same, but the file size increased by more than 6 times. I want to know the reason why and probably a way to decrease the size of the .txt files. Here is the code how I covert the files:
import scipy.io as sio
import os
import pickle
import numpy as np
import glob as gb
files=gb.glob('wfkt_X_rb*.mat')
for filet in files:
print filet
mat=sio.loadmat(filet)
wave=mat['wavefunction'][0]
J=mat['J']
L=mat['L']
n=mat['n']
xmax=mat['xmax'][0][0]
xmin=mat['xmin'][0][0]
xstep=mat['xstep'][0][0]
energy=mat['energy'][0][0]
name=filet.replace('.mat','.txt')
name=name.replace('rb','Rb')
x=np.linspace(xmin, xmax, num=len(wave), endpoint=False)
Data=np.transpose([x,wave])
np.savetxt(name,Data)
os.remove(filet)
with open(name, "a") as f:
f.write(str(energy)+" "+str(xstep)+"\n")
f.write(str(xmin)+" "+str(xmax))
and the format of the data file needed is :
2.700000000000000000e+01 6.226655250941872093e-04
2.700099997457605738e+01 6.232789496263042460e-04
2.700199994915211121e+01 6.238928333406641843e-04
2.700299992372816860e+01 6.245071764542571872e-04
2.700399989830422243e+01 6.251219791839867897e-04
2.700499987288027981e+01 6.257372417466700075e-04
2.700599984745633364e+01 6.263529643590372287e-04
If you need more information, feel free to ask! Thanks in advance.
.mat
is a binary format whereas numpy.savetxt()
writes a plain text file. The binary representation of a double precision number (IEEE 754 double precision) takes 8 bytes. By default, numpy saves this as plain text in the format 0.000000000000000000e+00
resulting in 24 bytes.
There are number of additional effects which affect the resulting file size. E.g. structural overhead of the file format, compression, the format you use for writting the plain text (number of decimal digits). However in your case, i suspect that the main effect is just the difference between a binary and a plain text representation of the numbers.
If you want to decrease the file size, you should use a different output format. Possible options are:
write a zipped text file:
import gzip
with open('data.txt.gz', 'wb') as f:
numpy.savetxt(f, myarray)
Save as .mat
again. See scipy.io.savemat()
.npy
). See numpy.save().npz
). See numpy.savez_compressed()Which option to choose depends on your situation: Who will have to read the data afterwards? How important is the compression factor? Is your data just one single array or is the structure more complex?