Efficient memory file-system for NetBSD ======================================= PROJECT DESCRIPTION ------------------- At the moment, NetBSD includes a memory-based file-system called mfs. mfs is is just an implementation of the regular ffs - designed for persistent storage - on top of the (volatile) virtual memory system. This means that it uses the same data structures as the on-disk implementation, rendering less than optimal performance and memory usage. As regards the latter, and in words of a NetBSD developer, the physical memory and swap space needed to back these pages constantly grows. The NetBSD OS is in a need of an efficient memory file-system that uses its own data structures to manage the stored files. The main design goal is to make it use the correct amount of memory to work correctly and efficiently; no more, no less. Having said this, here is the main goal of this summer-of-code project: to implement this memory efficient file-system under the NetBSD OS using a 3-clause BSD license (without the advertising clause). I'll call it tmpfs throughout this proposal (though it might be renamed in a future). There are other secondary goals associated to the project, all of them related to the documentation, which is a must (specially in software projects that will be public). This is discussed below. A document describing tmpfs interal details will be written. This is to allow third parties (and specially NetBSD developers) to review how it works and quickly get into its internal details in case they want to do improvements. Furthermore, I plan to write a "file-system HOWTO" document. Its purpose is to detail all the steps required to write a file-system driver from scratch under NetBSD. The spirit of this should be similar to the existing "Device Driver Writing Guide", which details how to write hardware drivers. BENEFITS FOR THE NETBSD PROJECT ------------------------------- The main benefit for The NetBSD Project is the addition of a new file-system which opens up multiple possibilities: The most visible one is the ability to configure the /tmp hierarchy to live on tmpfs even on machines with few ram. As as side effect, NetBSD's installation program, sysinst, could be modified to mount /tmp over tmpfs by default (in fact, this already happens, but using mfs). Another application is to use tmpfs to store work trees: the directories used to hold object and executable files during compilations. Doing this can boost build times in machines with slow I/O (NFS over an slow Ethernet comes to mind) but with a good amount of RAM. The project will also gain a document to teach people how to start writing a file-system driver. This is important because people willing to contribute in this kernel-area will have things easier. At the moment, the only way to start is by looking at the code of some existing file-systems, a tough job. WHEN WILL I WORK ON THIS ------------------------ I intend to start working on this project, at the very least, at the middle of June, after the 15th or so (since I have final exams before that). As regards the length of the project, it seems to me it's correctly sized for a two/three months period (check out the development plan below to see why). Note that I'll be working on it full-time, so it should be finished (including the mentioned documents) by the deadline, end of August. DEVELOPMENT PLAN ---------------- The development plan will include, in order: 1. Reading the file-system chapter in the "Design and Implementation of the 4.4BSD Operating System" book. I've been told it includes some few notes about how tmpfs could be done (or which are mfs' limitations). 2. Reading the code of some existing simple file-systems to get an idea of the whole picture (which are the call traces, how is data handled, which are the main entry points, etc.). This includes reading code from ptyfs, kernfs and maybe procfs, all of which are memory-based systems. 3. Probably reading code from mfs. The systems described in 2. are very special as regards write support (i.e., it is almost lacking, because it makes no sense), so I feel they won't be specially useful to understand how this functionality works. On the other hand, mfs is a complete filesystem, so it will be helpful in this area. 4. Design the necessary in-memory data structures to hold directories, files and all the associated meta-data. This will be done with memory-constraints in mind, but, of course, also trying to be fast. A correct balance between speed and memory usage needs to be achieved, hence a design is required beforehand. 5. Implement the file-system itself according to the decisions taken in 4. and the knowledge acquired in 2. and 3. 6. Debug (although this will be heavily overlapped with 5.). 7. Optimize the code, if possible, and more debugging as this goes on. 8. Write the document describing how to write file-systems and the document explaining the internal layout of tmpfs, based on all the notes I'll have taken during the previous points. WHY DO I FEEL QUALIFIED FOR THIS TASK ------------------------------------- I have been a NetBSD developer since November 2002, so I know the project policies and goals fairly well. That is, I know where to contact for help, which guidelines should be applied to code, how to create patches, etc. This means I can get to work fairly quickly, without having to get myself introduced into NetBSD development first. As regards NetBSD-related coding, I've done some things in the past, which are described below. Note that my main contributions are focused on its packaging system (pkgsrc), not the kernel, so I'm not going to describe them here since they are unrelated. My first contribution was the implementation of a user-space console mouse daemon (similar to gpm), which can be seen at: http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.sbin/wsmoused/; it required adding some functionality to the kernel. During its implementation, I learned the basics of kernel ioctl's, the machine dependent/independent code separation in the drivers, and some other minor details about wscons and vga. This was a very exciting thing to do, specially when I got it working :) I've also improved the wscons console driver in multiple ways (basically adding customizable colors); this can be seen at http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/wscons/. To achieve this, I mostly had to add code to the kernel (wscons) and the underlying hardware drivers (vga). At last, I've also debugged a network driver, vr (for VIA Rhine), as well as some other minor problems in other places. Despite these three things are not related to writing tmpfs in many ways, they show that I know, more or less, how kernel development works, where the files are, how to find problems, who to ask, etc. Unfortunately, my kernel knowledge is limited. For example, I don't know anything about how it handles file-systems (aside from the theory I learned at a university course last year and some FAT code I wrote in the past using assembly). Here is where this project comes to play (and one of the reasons I would like to do it): I intend to learn how the file-system (the VFS layer) and virtual memory areas of the kernel work while reading code and writing tmpfs. This is more important than the code itself, because it will open me the door to future contributions to this part of the system (and I wish I were more involved in kernel development; it's so interesting). I should also mention that I already have experience in managing free software projects, as can be seen in: http://xmlcatmgr.sourceforge.net/, http://buildtool.sourceforge.net/ and http://vcsme.sourceforge.net/. Distributing tmpfs will be interesting though, as it will have to include a set of new files, as well as a patch to existing ones. So basically, I want to seize this opportunity to learn more about the NetBSD kernel and file-systems, hoping to be able to contribute more stuff at this level in the future. And, at the same time, writing some needed code for my OS of choice. WHY DO I FEEL I AM A GOOD CANDIDATE FOR SUMMER OF CODE ------------------------------------------------------ I feel this is a great opportunity to force myself into working on something challenging, and to get myself into something that has always interested me (kernel development). Otherwise, I would spend the summer working on other things (free software related too), but which do not bring me new knowledge. Note that, as I already said and if I'm taken, I'll be working on this full time, so there are good chances I get this project finished.