Search code examples
pythonsrt

Is there some kind of limit to the amount of output Python 3.4 allows using the write() method at one time?


I put trailing print() methods right next to my write() method lines at the end of my code to test why my output files were incomplete. But, the print() output is "all the stuff" I expect; while the write() output is off by a confusing amount (only 150 out of 200 'things'). Reference Image of Output: IDLE versus external output file

FYI: Win 7 64 // Python 3.4.2

My modules take an SRT captions file ('test.srt') and returns a list object I create from it; in particular, one with 220 list entries of the form: [[(index), [time], string]]

times = open('times.txt', 'w')

### A portion of Riobard's SRT Parser: srt.py
import re

def tc2ms(tc):
    ''' convert timecode to millisecond '''

    sign    = 1
    if tc[0] in "+-":
        sign    = -1 if tc[0] == "-" else 1
        tc  = tc[1:]

    TIMECODE_RE     = re.compile('(?:(?:(?:(\d?\d):)?(\d?\d):)?(\d?\d))?(?:[,.](\d?\d?\d))?')
    match   = TIMECODE_RE.match(tc)
    try: 
        assert match is not None
    except AssertionError:
        print(tc)
    hh,mm,ss,ms = map(lambda x: 0 if x==None else int(x), match.groups())
    return ((hh*3600 + mm*60 + ss) * 1000 + ms) * sign

# my code
with open('test.srt') as f:
    file = f.read()

srt = []

for line in file:
    splitter = file.split("\n\n")

# SRT splitter
i = 0
j = len(splitter)
for items in splitter:
    while i <= j - 2:
        split_point_1 = splitter[i].index("\n")
        split_point_2 = splitter[i].index("\n", split_point_1 + 1)
        index = splitter[i][:split_point_1]
        time = [splitter[i][split_point_1:split_point_2]]
        time = time[0][1:]
        string = splitter[i][split_point_2:]
        string = string[1:]
        list = [[(index), [time], string]]
        srt += list
        i += 1

# time info outputter
i = 0
j = 1
for line in srt:
    if i != len(srt) - 1:
        indexer = srt[i][1][0].index(" --> ")
        timein = srt[i][1][0][:indexer]
        timeout = srt[i][1][0][-indexer:]
        line_time = (tc2ms(timeout) - tc2ms(timein))/1000
        space_time = ((tc2ms((srt[j][1][0][:indexer]))) - (tc2ms(srt[i][1][0][-indexer:])))/1000
        out1 = "The space between Line " + str(i) + " and Line " + str(j) + " lasts " + str(space_time) + " seconds." + "\n"
        out2 = "Line " + str(i) + ": " + str(srt[i][2]) + "\n\n"
        times.write(out1)
        times.write(out2)
        print(out1, end="")
        print(out2)
        i += 1
        j += 1
    else:
        indexer = srt[i][1][0].index(" --> ")
        timein = srt[i][1][0][:indexer]
        timeout = srt[i][1][0][-indexer:]
        line_time = (tc2ms(timeout) - tc2ms(timein))/1000
        outend = "Line " + str(i) + ": " + str(srt[i][2]) + "\n<End of File>"
        times.write(outend)
        print(outend)

My two write() method output files, respectively, only print out either ~150 or ~200 items of the 220 things it otherwise correctly prints to the screen.


Solution

  • You want to close your times file when done writing; operating systems use write buffers to speed up file I/O, collecting larger blocks of data to be written to disk in one go; closing the file flushes that buffer:

    times.close()
    

    Consider opening the file in a with block:

    with open('times.txt', 'w') as times:
        # all code that needs to write to times