Search code examples
pythonpython-3.xfile-processing

Use only a certain portion of file in every iteration


I am using an external API for Python (specifically 3.x) to get search results based on certain keywords located in a .txt file. However, due to a constraint as to how many keywords I can search for in every time interval (assume I need an hourly wait) I run the script, I can only use a portion of the keywords (say 50 keywords). How can I, Pythonically, use only a portion of the keywords in every iteration?

Let's assume I have the following list of keywords in the .txt file myWords.txt:

Lorem #0
ipsum #1
dolor #2
sit   #3
amet  #4
...
vitae #167

I want to use on the keywords found in 0-49 (i.e. the first 50 lines) on the first iteration, 50-99 on the second, 100-149 on the third, and 150-167 on the fourth and last iteration.

This is, of course, possible by reading the whole file, read an iteration counter saved elsewhere, and then choose the keyword range residing in that iterable part of the complete list. However, in what I'd like to do, I do not want to have an external counter, but rather only have my Python script and the myWords.txt where the counter is dealt with in the Python code itself.

I want to take only the keywords that I should be taking in the current run of the script (depending on the (total number of keywords)/50). At the same time, if I were to add any new keywords at the end of the myWords.txt, it should adjust the iterations accordingly, and if needed, add new iterations.


Solution

  • As far as I know there is no way to persist the keywords used between different invocations of your script. However, you do have a couple of choices in how you implement a "persistent storage" of the information that you need in different invocations of the script.

    1. Instead of just having a single input file named myWords.txt, you could have two files. One file containing keywords that you want to search for and one file containing keywords that you've already searched for. As you search for keywords you remove them from the one file and place them in the other.
    2. You can implement a persistent storage strategy, that stores the words.
    3. (The easiest thing and what I would do) is just have a file named next_index.txt and store the last index from your iteration.

    Here is an implementation of what I would do:

    Create a next position file

    echo 0 > next_pos.txt
    

    Now do your work

    with open('next_pos.txt') as fh:
        next_pos = int(fh.read().strip())
    
    rows_to_search = 2 # This would be 50 in your case
    keywords = list()
    with open('myWords.txt') as fh:
        fh.seek(next_pos)
        for _ in range(rows_to_search):
           keyword = fh.readline().strip()
           keywords.append(keyword)
           next_pos = fh.tell()
    
    # Store cursor location in file.
    with open('next_pos.txt', 'w') as fh:
        fh.write(str(next_pos))
    
    # Make your API call
    # Rinse, Wash, Repeat
    

    As I've stated you have lots of options, and I don't know if any one way is more Pythonic than any other, but whatever you do try and keep it simple.