I have a folder of images that contains several thousand images in 3 different formats: png
, jpg
and webp
. For instance: boat.png
, boat.webp
, plane.jpg
, plane.png
, plane.webp
.
I decided to learn Zig and write a program that will go through this directory, and delete all but the smallest file of a certain "basename" (ex: boat or plane).
Keep in mind, yes I tried the AIs. They don't know enough Zig
I have got most of the grouping done but I am worried by the output:
Lets open the ../tests directory
File DSC_0031.webp (fs.File.Kind.file)
Filename DSC_0031.webp
Basename: DSC_0031
File DSC_0031.png (fs.File.Kind.file)
Filename DSC_0031.png
Basename: DSC_0031
File DSC_0025.png (fs.File.Kind.file)
Filename DSC_0025.png
Basename: DSC_0025
File DSC_0025.JPG (fs.File.Kind.file)
Filename DSC_0025.JPG
Basename: DSC_0025
File DSC_0031.JPG (fs.File.Kind.file)
Filename DSC_0031.JPG
Basename: DSC_0031
File DSC_0027.webp (fs.File.Kind.file)
Filename DSC_0027.webp
Basename: DSC_0027
File DSC_0050.webp (fs.File.Kind.file)
Filename DSC_0050.webp
Basename: DSC_0050
File DSC_0007.webp (fs.File.Kind.file)
Filename DSC_0007.webp
Basename: DSC_0007
File DSC_0027.JPG (fs.File.Kind.file)
Filename DSC_0027.JPG
Basename: DSC_0027
File DSC_0033.JPG (fs.File.Kind.file)
Filename DSC_0033.JPG
Basename: DSC_0033
File DSC_0033.png (fs.File.Kind.file)
Filename DSC_0033.png
Basename: DSC_0033
File DSC_0027.png (fs.File.Kind.file)
Filename DSC_0027.png
Basename: DSC_0027
File DSC_0046.webp (fs.File.Kind.file)
Filename DSC_0046.webp
Basename: DSC_0046
File DSC_0026.png (fs.File.Kind.file)
Filename DSC_0026.png
Basename: DSC_0026
File DSC_0032.png (fs.File.Kind.file)
Filename DSC_0032.png
Basename: DSC_0032
File DSC_0032.JPG (fs.File.Kind.file)
Filename DSC_0032.JPG
Basename: DSC_0032
File DSC_0026.JPG (fs.File.Kind.file)
Filename DSC_0026.JPG
Basename: DSC_0026
File DSC_0036.JPG (fs.File.Kind.file)
Filename DSC_0036.JPG
Basename: DSC_0036
File .DS_Store (fs.File.Kind.file)
Filename .DS_Store
Basename: .DS_Store
File DSC_0036.png (fs.File.Kind.file)
Filename DSC_0036.png
Basename: DSC_0036
File DSC_0037.png (fs.File.Kind.file)
Filename DSC_0037.png
Basename: DSC_0037
File DSC_0047.webp (fs.File.Kind.file)
Filename DSC_0047.webp
Basename: DSC_0047
File DSC_0037.JPG (fs.File.Kind.file)
Filename DSC_0037.JPG
Basename: DSC_0037
File DSC_0006.webp (fs.File.Kind.file)
Filename DSC_0006.webp
Basename: DSC_0006
File DSC_0051.webp (fs.File.Kind.file)
Filename DSC_0051.webp
Basename: DSC_0051
File DSC_0026.webp (fs.File.Kind.file)
Filename DSC_0026.webp
Basename: DSC_0026
File DSC_0035.JPG (fs.File.Kind.file)
Filename DSC_0035.JPG
Basename: DSC_0035
File DSC_0035.png (fs.File.Kind.file)
Filename DSC_0035.png
Basename: DSC_0035
File DSC_0034.png (fs.File.Kind.file)
Filename DSC_0034.png
Basename: DSC_0034
File DSC_0034.JPG (fs.File.Kind.file)
Filename DSC_0034.JPG
Basename: DSC_0034
File DSC_0056.webp (fs.File.Kind.file)
Filename DSC_0056.webp
Basename: DSC_0056
File DSC_0053.JPG (fs.File.Kind.file)
Filename DSC_0053.JPG
Basename: DSC_0053
File DSC_0047.JPG (fs.File.Kind.file)
Filename DSC_0047.JPG
Basename: DSC_0047
File DSC_0047.png (fs.File.Kind.file)
Filename DSC_0047.png
Basename: DSC_0047
File DSC_0053.png (fs.File.Kind.file)
Filename DSC_0053.png
Basename: DSC_0053
File DSC_0040.webp (fs.File.Kind.file)
Filename DSC_0040.webp
Basename: DSC_0040
File DSC_0052.png (fs.File.Kind.file)
Filename DSC_0052.png
Basename: DSC_0052
File DSC_0046.png (fs.File.Kind.file)
Filename DSC_0046.png
Basename: DSC_0046
File DSC_0046.JPG (fs.File.Kind.file)
Filename DSC_0046.JPG
Basename: DSC_0046
File DSC_0052.JPG (fs.File.Kind.file)
Filename DSC_0052.JPG
Basename: DSC_0052
File DSC_0044.JPG (fs.File.Kind.file)
Filename DSC_0044.JPG
Basename: DSC_0044
File DSC_0050.JPG (fs.File.Kind.file)
Filename DSC_0050.JPG
Basename: DSC_0050
File DSC_0050.png (fs.File.Kind.file)
Filename DSC_0050.png
Basename: DSC_0050
File DSC_0044.png (fs.File.Kind.file)
Filename DSC_0044.png
Basename: DSC_0044
File DSC_0037.webp (fs.File.Kind.file)
Filename DSC_0037.webp
Basename: DSC_0037
File DSC_0045.png (fs.File.Kind.file)
Filename DSC_0045.png
Basename: DSC_0045
File DSC_0051.png (fs.File.Kind.file)
Filename DSC_0051.png
Basename: DSC_0051
File DSC_0051.JPG (fs.File.Kind.file)
Filename DSC_0051.JPG
Basename: DSC_0051
File DSC_0045.JPG (fs.File.Kind.file)
Filename DSC_0045.JPG
Basename: DSC_0045
File DSC_0041.JPG (fs.File.Kind.file)
Filename DSC_0041.JPG
Basename: DSC_0041
File DSC_0055.JPG (fs.File.Kind.file)
Filename DSC_0055.JPG
Basename: DSC_0055
File DSC_0036.webp (fs.File.Kind.file)
Filename DSC_0036.webp
Basename: DSC_0036
File DSC_0055.png (fs.File.Kind.file)
Filename DSC_0055.png
Basename: DSC_0055
File DSC_0041.png (fs.File.Kind.file)
Filename DSC_0041.png
Basename: DSC_0041
File DSC_0040.png (fs.File.Kind.file)
Filename DSC_0040.png
Basename: DSC_0040
File DSC_0054.png (fs.File.Kind.file)
Filename DSC_0054.png
Basename: DSC_0054
File DSC_0054.JPG (fs.File.Kind.file)
Filename DSC_0054.JPG
Basename: DSC_0054
File DSC_0040.JPG (fs.File.Kind.file)
Filename DSC_0040.JPG
Basename: DSC_0040
File DSC_0056.JPG (fs.File.Kind.file)
Filename DSC_0056.JPG
Basename: DSC_0056
File DSC_0042.JPG (fs.File.Kind.file)
Filename DSC_0042.JPG
Basename: DSC_0042
File DSC_0042.png (fs.File.Kind.file)
Filename DSC_0042.png
Basename: DSC_0042
File DSC_0056.png (fs.File.Kind.file)
Filename DSC_0056.png
Basename: DSC_0056
File DSC_0057.png (fs.File.Kind.file)
Filename DSC_0057.png
Basename: DSC_0057
File DSC_0043.png (fs.File.Kind.file)
Filename DSC_0043.png
Basename: DSC_0043
File DSC_0041.webp (fs.File.Kind.file)
Filename DSC_0041.webp
Basename: DSC_0041
File DSC_0043.JPG (fs.File.Kind.file)
Filename DSC_0043.JPG
Basename: DSC_0043
File DSC_0057.JPG (fs.File.Kind.file)
Filename DSC_0057.JPG
Basename: DSC_0057
File DSC_0057.webp (fs.File.Kind.file)
Filename DSC_0057.webp
Basename: DSC_0057
File DSC_0039.webp (fs.File.Kind.file)
Filename DSC_0039.webp
Basename: DSC_0039
File DSC_0042.webp (fs.File.Kind.file)
Filename DSC_0042.webp
Basename: DSC_0042
File DSC_0054.webp (fs.File.Kind.file)
Filename DSC_0054.webp
Basename: DSC_0054
File DSC_0059.JPG (fs.File.Kind.file)
Filename DSC_0059.JPG
Basename: DSC_0059
File DSC_0035.webp (fs.File.Kind.file)
Filename DSC_0035.webp
Basename: DSC_0035
File DSC_0059.png (fs.File.Kind.file)
Filename DSC_0059.png
Basename: DSC_0059
File DSC_0058.png (fs.File.Kind.file)
Filename DSC_0058.png
Basename: DSC_0058
File DSC_0058.JPG (fs.File.Kind.file)
Filename DSC_0058.JPG
Basename: DSC_0058
File DSC_0058.webp (fs.File.Kind.file)
Filename DSC_0058.webp
Basename: DSC_0058
File DSC_0059.webp (fs.File.Kind.file)
Filename DSC_0059.webp
Basename: DSC_0059
File DSC_0048.JPG (fs.File.Kind.file)
Filename DSC_0048.JPG
Basename: DSC_0048
File DSC_0048.png (fs.File.Kind.file)
Filename DSC_0048.png
Basename: DSC_0048
File DSC_0034.webp (fs.File.Kind.file)
Filename DSC_0034.webp
Basename: DSC_0034
File DSC_0049.png (fs.File.Kind.file)
Filename DSC_0049.png
Basename: DSC_0049
File DSC_0049.JPG (fs.File.Kind.file)
Filename DSC_0049.JPG
Basename: DSC_0049
File DSC_0055.webp (fs.File.Kind.file)
Filename DSC_0055.webp
Basename: DSC_0055
File DSC_0014.webp (fs.File.Kind.file)
Filename DSC_0014.webp
Basename: DSC_0014
File DSC_0043.webp (fs.File.Kind.file)
Filename DSC_0043.webp
Basename: DSC_0043
File DSC_0038.webp (fs.File.Kind.file)
Filename DSC_0038.webp
Basename: DSC_0038
File DSC_0025.webp (fs.File.Kind.file)
Filename DSC_0025.webp
Basename: DSC_0025
File DSC_0005.JPG (fs.File.Kind.file)
Filename DSC_0005.JPG
Basename: DSC_0005
File DSC_0039.JPG (fs.File.Kind.file)
Filename DSC_0039.JPG
Basename: DSC_0039
File DSC_0005.png (fs.File.Kind.file)
Filename DSC_0005.png
Basename: DSC_0005
File DSC_0039.png (fs.File.Kind.file)
Filename DSC_0039.png
Basename: DSC_0039
File DSC_0033.webp (fs.File.Kind.file)
Filename DSC_0033.webp
Basename: DSC_0033
File DSC_0048.webp (fs.File.Kind.file)
Filename DSC_0048.webp
Basename: DSC_0048
File DSC_0038.png (fs.File.Kind.file)
Filename DSC_0038.png
Basename: DSC_0038
File DSC_0038.JPG (fs.File.Kind.file)
Filename DSC_0038.JPG
Basename: DSC_0038
File DSC_0006.JPG (fs.File.Kind.file)
Filename DSC_0006.JPG
Basename: DSC_0006
File DSC_0006.png (fs.File.Kind.file)
Filename DSC_0006.png
Basename: DSC_0006
File DSC_0044.webp (fs.File.Kind.file)
Filename DSC_0044.webp
Basename: DSC_0044
File DSC_0007.png (fs.File.Kind.file)
Filename DSC_0007.png
Basename: DSC_0007
File DSC_0007.JPG (fs.File.Kind.file)
Filename DSC_0007.JPG
Basename: DSC_0007
File DSC_0005.webp (fs.File.Kind.file)
Filename DSC_0005.webp
Basename: DSC_0005
File DSC_0052.webp (fs.File.Kind.file)
Filename DSC_0052.webp
Basename: DSC_0052
File DSC_0053.webp (fs.File.Kind.file)
Filename DSC_0053.webp
Basename: DSC_0053
File DSC_0045.webp (fs.File.Kind.file)
Filename DSC_0045.webp
Basename: DSC_0045
File DSC_0014.JPG (fs.File.Kind.file)
Filename DSC_0014.JPG
Basename: DSC_0014
File DSC_0014.png (fs.File.Kind.file)
Filename DSC_0014.png
Basename: DSC_0014
File DSC_0049.webp (fs.File.Kind.file)
Filename DSC_0049.webp
Basename: DSC_0049
File DSC_0032.webp (fs.File.Kind.file)
Filename DSC_0032.webp
Basename: DSC_0032
(
DSC_0031.png
DSC_0031.JPG
(
DSC_0025.JPG
(
DSC_0027.JPG
DSC_0027.png
(
(
(
DSC_0033.png
(
(
DSC_0026.JPG
(
DSC_0032.JPG
(
DSC_0036.png
(
DSC_0037.JPG
(
DSC_0044
DSC_0044.webp
DSC_0045.JPG
DSC_0059.png
DSC_0059.webp
DSC_0044.webp
DSC_0007
DSC_0007.png
DSC_0051.png
DSC_0051.JPG
DSC_0041.JPG
DSC_0041.png
DSC_0041.webp
DSC_0058.png
DSC_0058.JPG
DSC_0058.webp
DSC_0007.png
DSC_0007.JPG
DSC_0007
DSC_0007.JPG
DSC_0055.JPG
DSC_0055.png
DSC_0005
DSC_0005.web
DSC_0035.png
DSC_0036.webp
DSC_0005.webp
DSC_0053
DSC_0053.web
DSC_0034.JPG
DSC_0048.JPG
DSC_0048.png
DSC_0048.webp
DSC_0053.webp
DSC_0014
DSC_0014.JPG
DSC_0054.png
DSC_0054.JPG
DSC_0054.webp
DSC_0034.webp
DSC_0014.JPG
DSC_0014.png
DSC_0014
DSC_0014.png
DSC_0053.png
DSC_0049.png
DSC_0049.JPG
DSC_0049
DSC_0049.web
DSC_0047.png
DSC_0040.png
DSC_0040.JPG
DSC_0049.webp
DSC_0043
DSC_0043.webp
DSC_0042.JPG
DSC_0042.png
DSC_0042.webp
DSC_0043.webp
DSC_0038
DSC_0038.web
DSC_0052.JPG
DSC_0056.JPG
DSC_0056.png
DSC_0038.webp
DSC_0038.png
DSC_0038.JPG
DSC_0025
DSC_0025.web
DSC_0046.JPG
DSC_0057.png
DSC_0057.JPG
DSC_0057.webp
DSC_0025.webp
DSC_0005
DSC_0005.png
DSC_0044.png
DSC_0043.png
DSC_0043.JPG
DSC_0005.JPG
DSC_0005.png
DSC_0039
DSC_0039.png
DSC_0050.png
DSC_0039.JPG
DSC_0039.png
DSC_0038
DSC_0038.png
DSC_0038
DSC_0038.JPG
DSC_0048
DSC_0048.webp
DSC_0006
DSC_0006.JPG
DSC_0006.JPG
DSC_0006.png
DSC_0006
DSC_0006.png
DSC_0032
DSC_0032.webp
DSC_0032.webp
DSC_0014
DSC_0014.webp
DSC_0033
DSC_0033.webp
DSC_0052
DSC_0052.webp
DSC_0045
DSC_0045.webp
Here is the code:
const std = @import("std");
const fs = std.fs;
const path = std.fs.path;
const mem = std.mem;
const fileStruct = struct { baseName: []const u8, files: [][]const u8 };
pub fn main() !void {
const args = try std.process.argsAlloc(std.heap.page_allocator);
defer std.process.argsFree(std.heap.page_allocator, args);
if (args.len < 2) {
std.debug.print("Usage: {s} <directory>\n", .{args[0]});
return;
}
const dir_path: []u8 = args[1];
std.debug.print("Lets open the {s} directory\n", .{dir_path});
// const allocator = std.heap.page_allocator;
var target_dir = try fs.cwd().openDir(dir_path, .{});
defer target_dir.close();
var iter = target_dir.iterate();
var groupedFiles = std.ArrayList(fileStruct).init(std.heap.page_allocator);
while (try iter.next()) |entry| {
// Ensure we only process files
if (entry.kind != fs.Dir.Entry.Kind.file) continue;
std.debug.print("File {s} ({})\n", .{ entry.name, entry.kind });
var baseName = path.basename(entry.name);
std.debug.print("Filename {s}\n", .{baseName});
const extension = path.extension(baseName);
if (extension.len != 0) {
baseName = baseName[0 .. baseName.len - extension.len];
}
std.debug.print("Basename: {s}\n", .{baseName});
var foundGroup: bool = false;
for (groupedFiles.items) |*group| {
if (mem.eql(u8, group.baseName, baseName)) {
const new_files = try std.heap.page_allocator.alloc([]const u8, group.files.len + 1);
for (group.files, 0..) |file, i| {
new_files[i] = file;
}
new_files[group.files.len] = try std.heap.page_allocator.dupe(u8, entry.name);
group.files = new_files;
foundGroup = true;
break;
}
}
if (!foundGroup) {
const new_files = try std.heap.page_allocator.alloc([]const u8, 1);
new_files[0] = entry.name;
const newGroup = fileStruct{ .baseName = baseName, .files = new_files }; // Use & to coerce to slice
try groupedFiles.append(newGroup);
}
}
for (groupedFiles.items) |item| {
std.debug.print("{s}\n", .{item.baseName});
for (item.files) |file| {
std.debug.print(" {s}\n", .{file});
}
}
}
Can someone please explain the output? If you can point me in the right direction for deletion of big files, that'd be helpful too, but first, what is going on?
The variable entry
is only valid for a single iteration. This is because the iterator reuses the same memory for each iteration. But you keep pointers to entry.name
in the if (!foundGroup)
section:
new_files[0] = entry.name;
const newGroup = fileStruct{ .baseName = baseName, .files = new_files };
You can fix this by copying the strings into a dedicated memory with allocator.dupe
:
new_files[0] = try std.heap.page_allocator.dupe(u8, entry.name);
const newGroup = fileStruct{
.baseName = try std.heap.page_allocator.dupe(u8, baseName),
.files = new_files
};
This fixes the output of your program.
Also, as pointed out in the comments, you don't free memory at the end. You can use the GeneralPurposeAllocator
, it'll help you with memory leak detection:
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer std.debug.assert(gpa.deinit() == .ok);
const allocator = gpa.allocator();
To free the memory of the grouped files, you can do something like this:
var groupedFiles = std.ArrayList(fileStruct).init(allocator);
defer {
for (groupedFiles.items) |group| {
for (group.files) |file| {
allocator.free(file);
}
allocator.free(group.files);
allocator.free(group.baseName);
}
groupedFiles.deinit();
}
You're also leaking memory here:
group.files = new_files;
Either do allocator.free(group.files)
before the line, or use realloc
:
const new_files = try allocator.realloc(group.files, group.files.len + 1);
new_files[group.files.len] = try allocator.dupe(u8, entry.name);
group.files = new_files;
foundGroup = true;
break;