Skip to content

std.fs is missing an "open any" API #16738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
squeek502 opened this issue Aug 8, 2023 · 11 comments
Open

std.fs is missing an "open any" API #16738

squeek502 opened this issue Aug 8, 2023 · 11 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@squeek502
Copy link
Collaborator

squeek502 commented Aug 8, 2023

Right now, the status quo is:

  • std.fs.Dir.openFile
    • On Windows, this will return error.IsDir if the path is a directory (this is due to .filter = .file_only during the std.os.windows.OpenFile call)
    • On non-Windows, this will succeed if the path is a directory (unless OpenFlags.isWrite() is true, in which case it will return error.IsDir)
    • (this discrepancy is tracked in Platform specific error.IsDir behavior in fs.Dir.openFile #5732)
  • std.fs.Dir.openDir
    • On all platforms, this will return error.NotDir if the path is not a directory

This means that there is currently no cross-platform std.fs.Dir API that allows opening any path (regardless of its type). Having an API that allows opening any path on all platforms is possible, though, and would be useful (see https://ziggit.dev/t/how-to-check-if-path-points-to-a-file-or-a-directory/).


There are a few ways that this could be addressed:

  1. Add filter: enum { file_only, any } to File.OpenFlags and let the caller of Dir.openFile choose which behavior they want
  2. Add a separate Dir.openAny function that can open any type, and make Dir.openFile always return error.IsDir for directory paths on all platforms
  3. Make openFile open any type on all platforms and therefore make that the 'cross-platform' behavior (i.e. don't even have a function that returns error.IsDir when opening a directory as read-only) [this is the option I prefer the least, since it means that you'd have to do a stat call afterwards if you wanted the error.IsDir behavior (which wouldn't be necessary on windows if OpenFile was just called with .file_only)]

Note that all of these options would effectively close #5732 (since in option 1 & 2, a stat call would be done on non-Windows in the file-only version to get the matching error.IsDir behavior)

@rohlem
Copy link
Contributor

rohlem commented Aug 8, 2023

To me the category "file" intuitively doesn't contain "directory", therefore option 2 of having a separate function openAny sounds the least surprising.

Looking into std.fs.File there is an enum Kind with more exotic entries like block_device, whiteout, door, event_port - openAny would presumably permit all of these.
Do you have any insight on whether most applications would want to agnostically handle these the same, or would a method openFileOrDir be more useful for the common use case?
(Although I'm not sure what operations besides renaming are even equally supported for files and directories either.)

@squeek502
Copy link
Collaborator Author

squeek502 commented Aug 8, 2023

Do you have any insight on whether most applications would want to agnostically handle these the same, or would a method openFileOrDir be more useful for the common use case

The primary use case for openAny over openFileOrDir would be if you wanted to do:

var something = try std.fs.cwd().openAny("something", .{});
defer something.close();

const stat = try something.stat();
switch (stat.kind) {
    .file => {},
    .named_pipe => {},
    // ...
}

(a real world example of this sort of thing here)

Plus, AFAIK openFileOrDir would require an open + stat call while openAny would be able to just be able to use an open call. In other words, openFileOrDir would just remove some possible use-cases of openAny for questionable benefit.

@jacobly0 jacobly0 added standard library This issue involves writing Zig code for the standard library. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Aug 12, 2023
@jacobly0 jacobly0 added this to the 0.13.0 milestone Aug 12, 2023
@andrewrk
Copy link
Member

I like (2) and I think that it could potentially even be called open rather than openAny. What would it return though?

  • openFile - returns a File
  • openDir - returns a Dir
  • open or openAny - ??

As a related side note, I have decided to revert #12060.

@andrewrk andrewrk added the accepted This proposal is planned. label Aug 20, 2023
@squeek502
Copy link
Collaborator Author

squeek502 commented Aug 20, 2023

What would it return though?

Good question. My initial thought would be File, since most of the File APIs would be useful regardless of the kind of the fd, but it may not be a perfect match.

This issue is also related: #12377 where some File APIs are useful for/missing for Dir

It's possible we may want to go for some sort of design where:

  • There is a type like Handle or Fd or something like that, and this is where most of the useful-for-any-kind APIs would be moved into
  • Both Dir and File would store that new Handle type instead of a os.fd_t, and Dir would effectively guarantee kind == .directory and File would effectively guarantee kind == .file if they are stated

Usage would then look something like:

var dir = try std.fs.cwd().openDir("blah", .{});
defer dir.close();

// `stat` is a wrapper around `dir.handle.stat()`, or maybe you'd have to call `dir.handle.stat()` directly
const stat = try dir.stat();

and then openAny would just return a Handle.

var handle = try std.fs.cwd().openAny("something", .{});
defer handle.close();

const other_stat = try handle.stat();

// maybe some way to go from Handle to File/Dir once the type is known?

Note that Dir.stat is currently a wrapper around File.stat:

zig/lib/std/fs.zig

Lines 2637 to 2643 in 4c97919

pub fn stat(self: Dir) StatError!Stat {
const file: File = .{
.handle = self.fd,
.capable_io_mode = .blocking,
};
return file.stat();
}

EDIT: See the comment after this one; I no longer think the design detailed in this comment makes sense with the 'non-dir' wrinkle: #16738 (comment)

@squeek502
Copy link
Collaborator Author

squeek502 commented Aug 20, 2023

So it actually turns out that OpenFlags.Filter.file_only is misnamed. When specified, the flag FILE_NON_DIRECTORY_FILE is used in NtCreateFile:

FILE_NON_DIRECTORY_FILE
The file being opened must not be a directory file or this call fails. The file object being opened can represent a data file, a logical, virtual, or physical device, or a volume.

In more practical terms, this means that .file_only will successfully open named pipes (and as a further wrinkle for stat there's nothing in FileAllInformation that distinguishes it as a named pipe, would need to separately query FilePipeInformation or FileFsDeviceInformation to distinguish that I think)

As an example:

> test.exe \\.\pipe\mypipe
opening NT-prefixed path '\??\pipe\mypipe' with filter: .file_only
attributes: 0x80
kind=fs.file.File.Kind.file

0x80 is FILE_ATTRIBUTE_NORMAL: A file that does not have other attributes set. This attribute is valid only when used alone.

Relevant test code (with debug prints in the std library functions):

    var output_file = try std.fs.cwd().openFile(output_path, .{ .mode = .read_write });
    defer output_file.close();

    const stat = try output_file.stat();

So openFile is already a misleading name and should likely be changed (openNonDir would be the most accurate I guess). This might make option 2 less attractive?

@jibal
Copy link

jibal commented Oct 21, 2023

To me the category "file" intuitively doesn't contain "directory"

And to me, who worked on UNIX kernels and did other systems programming on Posix systems for many years, intuitively (A) "file" is any object in the file system, something represented by a path, that has an inode and you can perform a stat on it, that is a named component of a directory (which, BTW, is why open{File,Dir} are somewhat confusingly in Dir).

But I also have the more common intuition of (B) a file as a random-access collection of bytes stored on disk, as opposed to a directory, symlink, socket, block or character device, etc. ... the sort of thing that on Posix systems is sometimes referred to as a "plain file".

Which indicates that the term "file" is ambiguous. And it's really not possible to avoid the ambiguity when we have

std.fs.File.Kind.File, where the first File represents (A) and the second represents (B), and
std.fs.Dir.OpenError.FileNotFound, where File represents (A), or the first proposal with
std.fs.Dir.openFile(path, File.OpenFlags{.filter: file_only}), where the two instances of File refer to (A) and file_only refers to (B).

Note that std.fs.Dir.openFile already returns a std.fs.File, which has members like Kind, stat, chmod, etc. that apply to any kind of "file" (A), which answers Andrew's question of what openAny (or of course option 1) should return, and indicates that Dir.openFile already encompasses any kind of "file" (A).

Given the existence of numerous instances of "File" referring to any kind of fs object, I think option 1 is preferable to OpenAny. But I would change OpenFlags.filter: enum {file_only, any} to perhaps bool directory_allowed, as that is what the Windows implementation offers, per @squeek502's last comment. If users want finer control on the kind of disk object they are willing to accept, they can use stat.

But wait ... suppose you open a directory using Dir.openFile ... there needs to be a File.toDir() that returns a Dir so that you can perform directory operations on it. Is that even doable for all systems? It is on Windows since you can do directory operations on the handle returned by NtCreateFile and it is on Linux which has fdopendir. I'm not familiar with WASI ... perhaps it can save the path and reopen it as a directory.

BTW, Dir contains chmod and chdir functions that create a File and then calls File.chmod or File.chdir. I think there ought to be a Dir.toFile() function that returns a File so that other operations such as stat , accessed, etc. can be performed. If any of the File functions break when given a directory (or other non "file" (B)) then that's a problem with this proposal generally and appropriate tests should be added.

Another possibility is to merge Dir into File ... after all, File already has numerous functions (read, write, seek, etc.) that don't apply to every Kind, so why not also have directory functions there? In which case I would suggest static functions for opening/creating absolute paths, and {open,create,makeDir}Relative with a dir: File parameter for paths relative to an open directory. I don't think you should have to mess with cwd() when it's not necessary, and the {open,create}File functions living in Dir has a somewhat high surprise factor.

In any case, I think the openAny idea uncovers a bunch of issues related to the fs.{File,Dir} dichotomy that warrant a rethink.

@rohlem
Copy link
Contributor

rohlem commented Oct 21, 2023

Naming idea: std.fs.Entry for "file system entry" as in @jibal's (A): Anything identified / exposed via the file system.
(The most explicit name for type (B) aka collection of bytes that come to my mind is Blob. Though maybe DataFile is more intuitive / discoverable.)

@jibal
Copy link

jibal commented Oct 21, 2023

@rohlem
I think fs.Entry is reasonable, as long as the fs. is retained. But most type (B) are text files, so I don't think Blob or DataFile are appropriate, as those usually indicate things that are not human-consumable. fs.Entry, fs.Entry.Kind.File, fs.Entry.open or fs.Dir.openEntry, and std.fs.Dir.OpenError.EntryNotFound would go pretty far to eliminate the ambiguity, but it's a pretty ambitious change.

@squeek502
Copy link
Collaborator Author

@jibal see #16736 for my thoughts on Dir and relative/absolute path stuff.

@pjz
Copy link
Contributor

pjz commented Nov 8, 2023

However it's split up on the inside, maybe a fs.Entry.open(.{ .kind = .{ .file, .dir } }) would be a good top-level wrapper/affordance?

@nektro
Copy link
Contributor

nektro commented Nov 22, 2023

for this use case my go-to has been to stat() it and then check .kind and use openFile/openDir respectively

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

7 participants