[NetBSD logo]    &    [Google logo]

NetBSD-SoC: Apropos replacement based on mandoc and SQLite's FTS

What is it?

Unix systems have had a culture and one of the main reasons behind the long standing success of Unices has been to follow this culture and philosophy over the years. Part of this culture and philosophy is to provide documentation for each component of the Operating System, whether it is a command line utility, a system call, a library function, a configuration file or anything that should be documented to make the life of the end user easier. This documentation has been shipped with the base system in the form of Manual pages (man pages in short), which can be easily accessed using the 'man' command.

A couple of utilities are also provided to search the documentation easily. apropos(1) can be used to search for man pages. How apropos(1) works is very simple. The name section of the man pages has been indexed in a file (typically named whatis.db) and apropos(1) performs search on this file for the keywords specified by the user.

While apropos(1) was designed keeping in mind the resources (both hardware and software) available during the early days, but things have changed drastically over the time. Now we have the resources available and in the Google era it behooves us to rethink the design and implementation of apropos(1). It is now possible to implement apropos(1) in a better manner so as to allow more extensive and flexible searches and that too over the complete content of the man pages rather than limiting it to the name section. More often than not we are not sure of the exact keywords to search for and apropos(1) doesn't give us the rignt results (or no results at all) in which case we turn to Google.

The idea behind this project is to mend this problem by reimplementing apropos(1) to enable full text search capabilities and in the process enhancing and modifying other man utilities as required. We have decided to use the FTS engine of Sqlite [1] for this purpose.

Project Repository:

The project is currently hosted on github: https://github.com/abhinav-upadhyay/apropos_replacement

Status

Weekly Report 1 : I know we have entered the 3rd week since the coding period started and I am late but in my defense, I was busy with exams during the first week, and started the work just from 1st of June. The first interesting bit was to build the list of directories where man pages are stored on the file system. Joerg suggested adding this as a new option to man(1). It will have two fold benefits for our project:

Joerg and David were very helpful and they reviewed different versions of the patch very patiently and answered my questions. I have sent the final version of the patch to Joerg, who will be committing it very soon. I have made a more detailed post on my blog about this.

Weekly Report 2: This was a more productive week. I worked on makemandb, which is responsible for traversing the set of directories returned by calling 'man -p' and parsing each of the man pages using libmandoc. It creates a new database in the present directory with the name 'apropos.db', and stores the parsed data in an FTS virtual table in it. Currently, it only parses the name of the man page, the one line description from the NAME section and the complete DESCRIPTION section and stores them in 3 columns in the table.
There are some issues that have come up during the parsing, but for the moment we can ignore them and focus on getting the initial prototype of the project ready.
For more information, you may read my blog post: Weekly Report 2

Weekly Report 3 Fixed many bugs, added a basic ranking algorithm. This report showed some sample runs of the new apropos.

Midterm Report Fixed some more bugs. Improved performance, improved the ranking algorithm and also discussed some upcoming features.

Project Update 5 Lots of changes and improvements and some regressions since last update report. Noticeable changes are:


You may also be interested in reading the report I posted on the mailing list which is more detailed than this: http://mail-index.netbsd.org/tech-userlevel/2011/07/31/msg005310.html

Final Status Report A final status report of the project: final-report.html

I have also uploaded man pages of the project in HTML format:

A blog post as well: http://abhinav-upadhyay.blogspot.com/2011/08/final-report-netbsd-gsoc-2011-apropos.html

Schedule

Deliverables

Mandatory (must-have) components:

Should have components:

Optional (would-be-nice) components:

References


[1]: Sqlite's FTS Engine Documentation
[2]:My Github Profile
My Complete Proposal
My Blog
Get NetBSD Summer of Code projects at SourceForge.net. Fast, secure and Free Open Source software downloads
Abhinav Upadhyay <er.abhinav.upadhyay at gmail dot com>
$Id: index.html,v 1.8 2011/08/24 17:18:01 abhinavupadhyay Exp $