Search code examples
pythonstream

Replace certain characters in stream


I have a method (a .yml parser) that takes an input stream as input. The problem is that it throws errors when it encounters certain characters in certain places e.g. %.

What I would like to do is take the stream, replace all of the % with a place holder, and then pass it to the parser.

This is what I have (which doesn't work with the current input):

    stream = open('file.yml', 'r')
    dict = yaml.safe_load(stream)

But what I think I need is something like:

    stream = open('file.yml', 'r')
    temp_string = stringFromString(stream)     #convert stream to string
    temp_string.replace('%', '_PLACEHOLDER_')  #replace with place holder
    stream = streamFromString(temp_String)     #conver back to stream
    dict = yaml.safe_load(stream)

Solution

  • Edit: Apparently the original answer here no longer appears to work, and the library now requires a file-like object.

    Given that, it becomes a little more awkward. You could write your own wrapper that acts in a file-like way (the basis for this would probably be io.TextIOBase) and does the replacement in a buffer, but if you are willing to sacrifice laziness, the easiest solution is roughly what was originally suggested in the question: do the replacement in memory.

    The solution for turning a string into a file-like object is io.StringIO.


    Old answer:

    A good way of doing this would be to write a generator, that way it remains lazy (the whole file doesn't need to be read in at once):

    def replace_iter(iterable, search, replace):
        for value in iterable:
            value.replace(search, replace)
            yield value
    
    with open("file.yml", "r") as file:
        iterable = replace_iter(file, "%", "_PLACEHOLDER")
        dictionary = yaml.safe_load(iterable)
    

    Note the use of the with statement to open the file - this is the best way to open files in Python, as it ensures files get closed properly, even when exceptions occur.

    Also note that dict is a poor variable name, as it will smash the built in dict() and stop you from using it.

    Do note that your stringFromStream() function is essentially file.read(), and steamFromString() is data.splitlines(). What you are calling a 'stream' is actually just an iterator over strings (the lines of the file).