& |
puffs has been integrated into NetBSD as of 20061022. Information on this page is no longer kept up-to-date (it is mostly out-of-date already when writing this). Refer to the NetBSD puffs page for current information.
I've updated the code to run against a -current source tree. See the README and TODO files for more information. If you are interested in the recent progress, the CHANGES file should track that quite closely now. Currently, among other things, it is now possible to do the following:
jojonaru# mount /dev/wd0a on / type ffs (local) puffs:detrempe on /puffs type puffs jojonaru# cd /usr/share jojonaru# pax -rw doc /puffs jojonaru# diff -r doc /puffs/doc jojonaru# pax -rw zoneinfo /puffs jojonaru# df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/wd0a 254079 173851 67525 72% / puffs:detrempe 248 248 0 100% /puffs jojonaru# cd /puffs jojonaru# ls vi zoneinfo jojonaru# rm -rf * jojonaru# df . Filesystem 1K-blocks Used Avail Capacity Mounted on puffs:detrempe 0 0 0 100% /puffs jojonaru# ls jojonaru# umount /puffs jojonaru# mount /dev/wd0a on / type ffs (local) .. and jojonaru# touch pink jojonaru# ls -l total 0 -rw-rw-rw- 1 root wheel 0 Oct 9 22:10 pink jojonaru# ln pink floyd jojonaru# echo 'the lunatic is on the grass' > pink jojonaru# rm pink jojonaru# cat floyd the lunatic is on the grass jojonaru# ls -l floyd -rw-rw-rw- 1 root wheel 28 Oct 9 22:10 floyd .. and jojonaru# echo 'the bug' > straits jojonaru# ln -s straits dire jojonaru# ls -l total 0 lrwxr-xr-x 1 root wheel 7 Oct 9 23:47 dire -> straits -rw-rw-rw- 1 root wheel 8 Oct 9 23:47 straits jojonaru# cat straits the bug jojonaru# rm straits jojonaru# cat dire cat: dire: No such file or directory .. and (directory containing hard links and symlinks) jojonaru# cd /usr/ jojonaru# pax -rw bin /puffs jojonaru# diff -r bin /puffs/bin jojonaru# df -i /puffs Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on puffs:detrempe 19328 19328 0 100% 735 0 100% /puffsThis means that detrempefs (a simple in-memory file system written on top of libpuffs) can survive copying a directory tree over and preserves the original structure and contents. puffs is now getting fairly close to integration into NetBSD.
The aim is to create a general-purpose framework for attaching filesystems running in userspace. The framework can then be used for various applications such as writing new filesystems in userspace to test them or some "novelty" uses as having a filesystem for user account administration.
On a more technical level, the work consists of writing a passthough-layer which attaches to the current virtual filesystem layer in the kernel and creating a communication infrastructure so that the filesystem can receive commands from the kernel and respond to them once they have completed the task. In addition, at least some pretended effort must be put into thinking about the interface to which userspace implementation will attach to.
The flow of control will be somewhat like the following: application (e.g. cat file) -> kernel (syscall, vfs ..) -> kernel puffs -> userspace puffs -> fs implementation (userspace) -> userspace puffs -> kernel puffs -> application
I was reading a cookbook when beginning the project and was at a chapter on puff pastry. Since the acronym almost fits the purpose and it can be imagined for the framework to increase the volume of the operating system, it was (unwisely) chosen.
And a détrempe is of course a flour and water paste, which is the first stage in making puff pastry.
Following the above call-flow, it can be said that the project is divided into four separate parts.
For incoming calls, the puffs framework needs to be integrated into the existing virtual filesystem framework. This is something that obviously needs to work for anything to work. Luckily, this part is just legwork, which of some parts are documented less than others, but it is suitable work to be done before having coffee in the morning (no, not really ... ;).
Another fairly obvious goal is to have a pipe between userspace and the kernel. This is harder to accomplish than what I initially thought. For some reason, every way (which can be said to be a general solution for the generic filesystem and efficient'ish on a straight face) I think of seems to be a dead end.
For the SoC project an acceptable method is to just have some working method (well, maybe not the "operator types requests manually in"-method ..), it can be improved later. It's not going to show to the user, so the filesystems can freely be developed. Of course they need to be relinked to the puffs lib after I finally figure out what the best way for communication is.
This requirement consists of specifying what the various vfsops and vnodeops should look like when transmitted to userspace. It's manual labour for all operations, but once a few are figured out, it should be doable on a fairly quick pace. The interface needs to be fairly well in place, although perhaps not perfectly thought out, by the end of the SoC project.
This specifies what the C-level linkage to the framework should look like. I'll probably have some kind of library available for filesystem development here, but it may not the ultimate killer interface yet by the end of the SoC project.
The mandatory requirement for the SoC project will be a prototyping implementation for puffs. I will continue perfecting it afterwards until it is working according to the NetBSD definition.
The deliverables shall meet the levels of quality specified in the Requirements section. They are as follows:
A test filesystem will naturally be implemented, but it may be just suitable for testing the framework, not useful for anything standalone, and therefore cannot be considered to be a deliverable.
% cd /puffs % df . Filesystem 512-blocks Used Avail Capacity Mounted on puffs:hardcode 573 228 345 39% /puffs % ls -l total 0 -rw-r--r-- 1 pooka users 0 Jan 28 1970 dumdidum -rw-r--r-- 1 pooka users 0 Jan 28 1970 fegato -rw-r--r-- 1 pooka users 0 Jan 28 1970 hotchocolate -rw-r--r-- 1 pooka users 17 Jan 28 1970 jopo -rw-r--r-- 1 pooka users 34 Jan 28 1970 saucerobert -rw-r--r-- 1 pooka users 0 Jan 28 1970 techno -rw-r--r-- 1 pooka users 0 Jan 28 1970 unparalleledroast % echo 'feeling livery, are you?' > fegato % stat -x fegato File: "fegato" Size: 25 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 1323/ pooka) Gid: ( 100/ users) Device: 0,0 Inode: 10 Links: 1 Access: Wed Jan 28 02:39:04 1970 Modify: Wed Jan 28 02:39:04 1970 Change: Wed Jan 28 02:39:04 1970 % cat fegato feeling livery, are you?
Implementing a userspace filesystem with puffs consists of filling out a few operation vectors (well, structs actually) and calling the mount function. Sounds simple, yes? Yes.. yes ... um... no.
What needs to be done to have a working filesystem implementation behind puffs is to use the routines provided by libpuffs, link your implementation against libpuffs and run the resulting application.
In-kernel filesystems attach to filesystem abstractions at two basic interfaces: the filesystem itself (vfs - virtual filesystem) and the filesystem nodes (vnode - virtual node). Each filesystem implements usually some subset of the operations in the two interfaces to achieve what it wants to achieve. Some features are mandatory to implement (such as mount), others can be implemented or left unimplemented depending on the filesystem in question.
The level of abstraction that the userspace implementation needs to bite into had to be decided. The existing readily available abstraction was the natural choice. Of course the operations cannot be passed directly to userspace as such (think about e.g. kernel memory space references), but on a basic level the operations are the same in the kernel and userspace.
Vnodes live and dwell in the kernel. They are pooled together when unused and are recycled between all filesystem types in the system. Basically their usage is highly optimized, since they come and go a lot in the daily operation of a normal healthy kernel.
However, in userspace we are not so concerned with performance. We even cannot pool nodes together between different mounts of puffs: they are separate processes, they have no way of knowing about each others' nodes. Even if we could, we probably would not want to, since pooling vnodes together adds much complexity.
But what we must be able to do is to map vnodes to userspace nodes and back. This is because the vnode operations are done on specific vnodes, and the userspace implementation must know which node we are operating on currently. The information between the kernel and userspace is passed back and forth as cookies. Most operations pass the information from the kernel to userspace, but some operations where node creation is involved pass this information from userspace back to the kernel (and the kernel then uses this information to pass the cookie values for operations on those created nodes).
The userspace implementation must be able to map cookie values to userspace nodes. The easiest scheme is to pass structure virtual memory addresses back. The kernel is okay with anything that is unique for every node (at a given period in time).
The information relevant to implementing a puffs-filesystem in this file are the argument structures to each vfs or vnode call (puffs_vfsreq_xxx and puffs_vnreq_yyy). In addition to vnode cookies discussed above, common information in calls are the operation caller credentials: the structure uucred (userspace representation of struct ucred) and the calling process id. The rest of the fields usually relate closely to the operation in question and can be spied on from the vfsops.9 and vnodeops.9 manual pages. (Yes, I will add better descriptions at a later date).
The other interesting pieces of information located here are struct puffs_sizeop and struct puffs_cn. I will discuss the inner truths related to that structure further down. puffs_cn is the translated userspace representation of the kernel struct componentname. It cannot be transferred with a 1:1 copy since it contains pointers to the kernel memory space.
For almost all operations it is enough to define a single simple callback, which does everything required for the operation. But as fate usually mandates, there are a few exceptions to this simple rule. These exceptions are operations which require an arbitrary-size buffer for completion (read/write/ioctl/fcntl). What happens now is that the first operation e.g. read1() is called and the userspace program must reserve enough memory for the operation (usually arg->resid) and pass the buffer location and size to the kernel in sizeop->userbuf and sizeop->bufsize, respectively. In the second call the (e.g. read2()) the buffer is freed. Of course read actually reads into the buffer in read1, while in a write operation the userspace filesystem code would read *from* the buffer in write2 and dedicate write1 to simply reserving buffer. Confusing? I'll draw a picture some day ;)
XXX-note-to-self: investigate calling uvm_mmap() for the handling process vm_map to reserve memory without a bounce-call. does this introduce unwanted side-effects (such as having to register the process doing the handling). still does not solve the problem with ioctl and fcntl.
A really simple (or perhaps even simpler than that) filesystem was created just to test out the framework and act as an example for people wanting to use the framework (ok, it might have been smarter to set a good example by writing good code, but the world isn't always aligned with one's wishes).
The filesystem consists of a flat layer of files where a single file is defined by struct hcfsfile in hcfs.h. As can be seen, the file only contains its name, attributes and a fixed-length memory storage buffer for storing information. While currently the data storage area is in anonymous memory, nothing would prevent e.g. from doing a simple mmap() on a file in another filesystem and therefore gaining non-volatile storage with just a few additional lines of code.
File creation is not possible runtime, all file nodes must be listed in the code and they are created already at runtime. There is no real reason besides lazyness for this.
This is the main entry point for the filesystem. As can be seen, the operation vectors are filled with dummy ops and then overwritten by the few real operations that hardcodefs supports. After this mount is called and internally it creates a new execution context. The calling context could hang around for whatever (and in the future I will probably change the interface so that the actual execution must not be passed into the hands of the mount function. Currently the interface is poor because all requests must be handled synchoronously. But more on this topic will follow at a later date), but it can also just exit, so that's what it does.
The creation of files is handled here. It is accomplished in creatfiles() by going over the pre-determined list, allocating space for nodes, allocating a predefined attributes structure for them, telling them their names and registering the nodes to the filesystem. As a convenience function, a routine for providing the "n th" file in the filesystem is provided (purely an artifact of the "design" of the filesystem).
The vfs operations implemented for this filesystem are contained here. No black magic present.
Finally, the vnode op implementations are here. Only the following are currently supported: lookup, readdir, getattr, read, write. Lookup simply goes through the list of flat nodes and finds the one that matches the given name in the filesystem and returns the address of the respective structure in a cookie. Readdir simply always returns the nth directory entry. Getattr is the easiest of all: a memcpy() from the file structure to the argument structure. The read and write calls work as described above in the paragraph on puffs_sizeop.
Useful documentation and "documentation" considering this project:
|