Search code examples
arraysstringzig

How can I write a Zig function that can accept and return strings?


I am very new to Zig (working through Advent Of Code in it), and I am very confused by its handling of strings (or, I should say, []u8s) as function arguments and return types.

TL;DR what is the correct implementation of the following function?

fn doIt(string: []u8) []u8 {
    return "prefix" ++ string;
}

First attempts

I would expect the following test to pass:

fn doIt(string: []u8) []u8 {
    return "prefix" ++ string;
}

const expect = @import("std").testing.expect;

test {
    try expect(std.mem.eql(u8, doIt("foo"), "prefixfoo"));
}

but instead zig test gives:

scratch.zig:27:33: error: expected type '[]u8', found '*const [3:0]u8'
    try expect(std.mem.eql(u8, doIt("foo"), "prefixfoo"));
                                    ^~~~~
scratch.zig:27:33: note: cast discards const qualifier
scratch.zig:20:17: note: parameter type declared here
fn doIt(string: []u8) []u8 {

OK, so the error message seems clear - I need to change the signature of the function to accept a pointer. I don't know why I'm not allowed to pass a ~~string~~ []u8-literal directly, but let's trust the compiler and try it:

// Let's not worry, for the moment, about the fact that we're writing a function which
// can only accept strings of length 3...
fn doIt(string: *const [3:0]u8) []u8 {
    return "prefix" ++ string;
}
...

giving

scratch.zig:23:21: error: expected type '[]u8', found '*const [9:0]u8'
    return "prefix" ++ string;
           ~~~~~~~~~^~~~~~~~~
scratch.zig:23:21: note: cast discards const qualifier
scratch.zig:20:33: note: function return type declared here
fn doIt(string: *const [3:0]u8) []u8 {
                                ^~~~

OK, the addition of two pointers-to-arrays results in a pointer to the result. That makes sense. I didn't want to be dealing with pointers in the first place - but since I was forced into "pointer-land", I can understand that the output of an operation there would also be a pointer. So, presumably, we just use .*, a.k.a pointer dereferencing, to return the actual value (a []u8), then?

fn doIt(string: *const [3:0]u8) []u8 {
    return ("prefix" ++ string).*;
}

giving

scratch.zig:21:32: error: array literal requires address-of operator (&) to coerce to slice type '[]u8'
    return ("prefix" ++ string).*;

...how can the address-of operator coerce a pointer into an object? Isn't that the inverse of what that operator does? But ok, let's try it...

fn doIt(string: *const [3:0]u8) []u8 {
    return &("prefix" ++ string).*;
}
scratch.zig:21:12: error: expected type '[]u8', found '*const [9:0]u8'
    return &("prefix" ++ string).*;
           ^~~~~~~~~~~~~~~~~~~~~~~
scratch.zig:21:12: note: cast discards const qualifier
scratch.zig:20:33: note: function return type declared here
fn doIt(string: *const [3:0]u8) []u8 {

...I give up, I must be misunderstanding something. Can anyone point (a-ha) me in the right direction?

Embrace the pointers

Taking a different tack, if we change the function's return type to be a pointer, there are still problems ahead:

fn doIt(string: *const [3:0]u8) *[]u8 {
    return "prefix" ++ string;
}

const expect = @import("std").testing.expect;

test {
    try expect(std.mem.eql(u8, doIt("foo"), "prefixfoo"));
}
scratch.zig:21:21: error: expected type '*[]u8', found '*const [9:0]u8'
    return "prefix" ++ string;
           ~~~~~~~~~^~~~~~~~~
scratch.zig:21:21: note: cast discards const qualifier
scratch.zig:20:33: note: function return type declared here
fn doIt(string: *const [3:0]u8) *[]u8 {
                                ^~~~~
scratch.zig:27:32: error: expected type '[]const u8', found '*[]u8'
    try expect(std.mem.eql(u8, doIt("foo"), "prefixfoo"));
                               ~~~~^~~~~~~
/Users/scubbo/zig/zig-macos-x86_64-0.14.0-dev.2362+a47aa9dd9/lib/std/mem.zig:658:33: note: parameter type declared here
pub fn eql(comptime T: type, a: []const T, b: []const T) bool {

The compiler suggests that I should make the return type of my function *const [9:0]u8. Which, with a little tweaking...still fails, in an even more surprising way:

fn doIt(string: *const [3:0]u8) *const [9:0]u8 {
    return "prefix" ++ string;
}

const expect = @import("std").testing.expect;

test {
    for (doIt("foo")) |char| {print("{c}", .{char});}
    print("\n", .{});
    for ("prefixfoo") |char| {print("{c}", .{char});}
    print("\n", .{});
    try expect(std.mem.eql(u8, doIt("foo"), "prefixfoo"));
}
pefixfoo
prefixfoo
1/1 scratch.test_0...FAIL (TestUnexpectedResult)
/Users/scubbo/zig/zig-macos-x86_64-0.14.0-dev.2362+a47aa9dd9/lib/std/testing.zig:546:14: 0x10846a78f in expect (test)
    if (!ok) return error.TestUnexpectedResult;
             ^
/Users/scubbo/Code/advent-of-code-2024/scratch.zig:31:5: 0x10846a936 in test_0 (test)
    try expect(std.mem.eql(u8, doIt("foo"), "prefixfoo"));
    ^
0 passed; 0 skipped; 1 failed.
error: the following test command failed with exit code 1:
/Users/scubbo/.cache/zig/o/1bb299b096246ee4dc2c6057c3d21f46/test --seed=0xc38b771a

That is not a typo or copy-paste mistake. The character-by-character printing of the output of return "prefix" ++ string; is pefixfoo. I could maybe understand the final character getting dropped somehow if I'd sized the array wrongly (though, see the next section), or the first character getting dropped for...some reason...but what could make the second character get dropped?

Flexibility of function inputs

And that's leaving aside the fact that a function signature of (string: *const [3:0]u8) *const[9:0]u8 would not, presumably, be able to accept a string of length 4. Hardly a multipurpose function!

References

Some links I have consulted to try to understand:


Solution

  • The ++ operator only works on arrays with comptime-known sizes. But you clearly want the function to be fully usable at runtime.

    This means that you need to be able to answer the question: where does your function get the memory for the new string? Idiomatically, the function would take an allocator, for example:

    fn doIt(allocator: std.mem.Allocator, string: []const u8) ![]u8 {
        const prefix = "prefix";
        const new_string = try allocator.alloc(u8, prefix.len + string.len);
        @memcpy(new_string[0..prefix.len], prefix);
        @memcpy(new_string[prefix.len..], string);
        return new_string;
    }
    

    And you need to free the new string after you're done using it.