I have a really large file that should consist of JSON strings. However, when I use the following code, I get a "stream did not contain valid UTF8".
let file = File::open("foo.txt")?;
let reader = BufReader::new(file);
for line in reader.lines() {
println!("{}", line?);
}
Ok(())
Now the answer to this is to use Vec<u8>
rather than String
. But all the code I've seen has file.read_to_end(buf)
as the answer which won't work for the filesizes I have to work with.
What I'm looking for is to read the file line by line, use lossy utf8 conversion and then do some calculations and push the output to another file.
You can use BufReader's read_until
function. It is very similar to File's read_to_end
, but also takes a byte
delimiter argument. This delimiter can be any byte, and a newline \n
byte would be suitable for you. Afterwards, you can just lossily convert the buffer from UTF-8. It would look something like this:
let file = File::open("foo.txt")?;
let mut reader = BufReader::new(file);
let mut buf = vec![];
while let Ok(_) = reader.read_until(b'\n', &mut buf) {
if buf.is_empty() {
break;
}
let line = String::from_utf8_lossy(&buf);
println!("{}", line);
buf.clear();
}
Ok(())
Of course, this could be abstracted away into an iterator, just like Lines is done, but the basic logic is the same as above.
NOTE: unlike the lines
function, the resulting strings will include the newline character, and the carriage return (\r
) if there is one. It will be needed to strip those characters away, if the behaviour of the solution has to match the lines
function.