Search code examples
stringzipddecodebytestream

D: decoding ubyte[] to string, redux


This question is a modified redux of this previous question:

how to decode ubyte[] to a specified encoding?

I'm looking for an idiomatic way to convert the ubyte[] array returned from a std.zip.ArchiveMember.expandedData attribute into a string or other range-able collection of strings... either the whole contents akin to calling File.open("file"), or something iterable in similar fashion to File.open("file").byLine().

So far everything I've found from the standard documentation that deals with character arrays or strings does not appreciate a ubyte[] argument, and the examples around D's zip file handling are very rudimentary, dealing only with getting raw data out of zip archives and their member files... with no obvious file/stream/io interface capable of being easily layered between the raw bytestream and text-oriented file/string manipulation.

I think I can find something in std.utf or std.uni to decode individual code points, and while/for-loop my way through the bytestream, but surely there might be a better way?

Code sample:

std.zip.ZipArchive zipFile;
// just humor me, this is what I've been given.
zipFile = new std.zip.ZipArchive("dataSet.csv.zip");
foreach(memberFile; zipFile.directory)
{
    zipFile.expand(memberFile);
    ubyte[] uByteArray = memberFile.expandedData;

    // ok, now what?
    // is there a relatively simplistic way to get this
    // decoded/translated byteStream into a string
    // or collection of strings(for example, one string per line
    // of the compressed file) ?

    string completeCsvContents = uByteArray.PQR();
    string[] csvRows = uByteArray.XYZ();
}

Is there anything that I could easily fill in for PQR or XYZ?

Or, if it's a matter of making an API call in the style of

string csvData = std.ABC.PQR(uByteArray)

What would ABC/PQR be?


Solution

  • If you know that the string is UTF-8 encoded, you can use std.string.assumeUTF to convert it to a string/char array. All this does is a cast, as Nested type mentions, but it's mode self-documenting.

    If you need to make sure that the resulting string is actually valid UTF-8 (as there are several operations with undefined behavior on invalid strings), then you can use std.utf.validate. assumeUTF also does this under debug builds.