I'm trying to split text file of size 100 MB (having unique rows) into 10 files of equal size using python pysftp but I'm unable to find proper approach for same.
Please let me know how can I read/ split files from SFTP directory and place back all files to FTP directory itself.
with pysftp.Connection(host=sftphostname, username=sftpusername, port=sftpport, private_key=sftpkeypath) as sftp:
with sftp.open(source_filedir+source_filename) as file:
for line in file:
<....................Unable to decide logic------------------>
The logic you probably need is as follows:
As you are in a read only environment, you will need to download the whole file into memory.
Use Python's io.StringIO()
to handle the data in memory as if it is a file.
As you are talking about rows, I assume you mean the file is in CSV format? You can make use of Python's csv
library to parse the file.
First do a quick scan of the file using a csv.reader()
, use this to count the number of rows in the file. This can then be used to determine how to split the file into equal number of rows, rather than just splitting the file at set byte counts.
Once you know the number of rows, reopen the data (as a file again) and just read the header row in. This can then be added to the first row of each split file you create.
Now read n
rows in (based on your total row count). Use a csv.writer()
and another io.StringIO()
to first write the header row and then write the split rows into memory. This can then be used to upload using pysftp
to a new file on the server, all without requiring access to an actual filing system.
The result will be that each file will also have a valid header row.