I have the sample data as below:
# df
VAR1 SEQ VAR2 VAR3 DATE VAR4 VAR5 VAR6 VAR7
AAA 1 YYY 01 20000630 AL 11111 ABCD PA
BBB 1 YYY 01 20100701 GA 12345 EDED NY
BBB 2 YYY 01 20150815 GA 12345 NY
BBB 3 YYY 01 19950105 GA 12345 YTRU NY
BBB 4 YYY 01 20000701 GA 12345 IIII NY
BBB 5 YYY 01 20210701 GA 12345 NY
CCC 1 NNN 01 20210630 CA 33333 SSSS NJ
CCC 2 NNN 01 20210629 CA 33333 NJ
In SAS
, we can export fixed width format file as below:
BLANK_VAR1 = " "
%MACRO FRIST;
PUT @ 1 "00FIRST"
@ 8 VAR1 $CHAR5.
@ 13 BLANK_VAR1 $CHAR2.
@ 15 VAR2 $CHAR3.
;
%MEND FRIST;
%MACRO SECOND;
PUT @ 1 "00SECOND"
@ 9 VAR3 $CHAR2.
@ 11 BLANK_VAR1 $CHAR2.
@ 13 VAR4 $CHAR2.
@ 15 VAR5 $CHAR5.
;
%MEND SECOND;
%MACRO THIRD(sequence);
num = &sequence.;
PUT @ 1 num Z2.0
@ 3 "THIRD" $CHAR5.
@ 8 DATE $CHAR8.
;
%MEND THIRD;
%MACRO FOURTH(sequence);
num = &sequence.;
PUT @ 1 num Z2.0
@ 3 "FOURTH" $CHAR5.
@ 9 VAR6 $CHAR25.
@ 34 BLANK_VAR1 $CHAR2.
@ 36 VAR7 $CHAR2.
;
%MEND FOURTH;
filename outtmp "/home/folder/outfile_tmp";
DATA _NULL_;
SET df;
BY VAR1 SEQ;
FILE outtmp;
IF FIRST.VAR1 THEN DO;
%FRIST;
%SECOND;
REC_CNT = 0;
END;
REC_CNT + 1;
IF REC_CNT LE 3 THEN DO;
%THIRD(REC_CNT);
IF VAR6 NE ' ' THEN DO;
%FOURTH(COUNTN);
END;
END;
RUN;
filename output "/home/folder/output";
%MACRO INREC;
PUT 001 RECIN $CHAR150.;
%MEND INREC;
%MACRO FILE_FIRST;
DATE = TODAY();
PUT @ 1 "###FIRSTLINE###"
@ 16 DATE JULIAN5.
@ 21 BLANK_VAR1 $CHAR2.
@ 23 "###FIRSTLINEEND###"
;
%MEND FILE_FIRST;
%MACRO FILE_LAST;
DATE = TODAY();
PUT @ 1 "###LASTLINE###"
@ 15 DATE JULIAN5.
@ 20 BLANK_VAR1 $CHAR2.
@ 22 "###LASTLINEEND###"
;
%MEND FILE_LAST;
DATA output;
INFILE outtmp truncover;
INPUT
@ 001 RECIN $CHAR150.;
RUN;
DATA _NULL_;
SET output end=last;
file output lrecl=256 ;
IF _N_ = 1 THEN DO;
%FILE_FIRST;
END;
%INREC;
IF last THEN DO;
%FILE_LAST;
END;
RUN;
###FIRSTLINE###21182 ###LASTLINEEND###
00FIRSTAAA YYY
00SECOND01 AL11111
01THIRD20000630
01FOURTHABCD PA
00FIRSTBBB YYY
00SECOND01 GA12345
01THIRD20100701
01FOURTHEDED NY
02THIRD20150815
03THIRD19950105
03FOURTHYTRU NY
00FIRSTCCC NNN
00SECOND01 CA33333
01THIRD20210630
01FOURTHSSSS NJ
###LASTLINE###21182 ###LASTLINEEND###
The logic for above program is:
VAR1
, then only output FIRST
and SECOND
once.THIRD
part for SEQ
is less than 3. If the SEQ
is larger than 3, do not output. Ignore.FOURTH
part following the third logic and also if VAR6
is not missing.THIRD
and FOURTH
part, the first two string should change from 01
to 03
depends on the records.How can I replicate this format in Python
?
I found that np.savetxt()
with fmt
argument might be a way link; however, the file should be the same order as original dataframe.
pandas
has function read_fwf()
to read fixed width format file; however, no to_fwf()
function to export.
I have been stuck for several days, so any idea should be helpful!
This isn't exactly a good way to do it, but maybe it gives you an idea how to do the logic. I'm just writing to a list, you can then write the list out - but probably you should do it the way JonSG did in his (deleted) answer where you use a file writer instead. There's probably a better approach using a data class, but that's not my expertise.
import pandas as pd
df = pd.read_csv(r"h:\temp\df_text.csv")
outlist = []
for index,row in df.iterrows():
if(row['SEQ']==1):
tempstr = '00FIRST'+row.VAR1+' '+row.VAR2
outlist.append(tempstr)
tempstr = '00SECOND'+str(row.VAR3)+' '+str(row.VAR4)+str(row.VAR5)
outlist.append(tempstr)
if(row['SEQ'] <= 3):
seqval ='0'+str(row.SEQ) if row.SEQ < 10 else str(row.SEQ)
tempstr = str(row['SEQ'])+'THIRD'+str(row.DATE)
outlist.append(tempstr)
if (row.VAR6 != ' '):
tempstr = str(row['SEQ'])+'FOURTH'+row.VAR6+' '+row.VAR7
outlist.append(tempstr)