NetBSD-SoC: Improved Writing to FileSystem Using Congestion
What is it?
When the rate of generation of
write requests by the kernel or userland processes exceeds the service
time of the underlying FileSystem(FS)/disk, write congestion can
happen. Such a phenomenon has many undesirable
affects manifested in the
form of :- (i) unequal sharing of disk bandwidth between
(ii) delayed write (due to lots of request getting piled up in the
buffer), and (iii) read requests getting affected due to onslaught of
write requests. Hence, the primary aim of this project is to
provide mechanism(s) that would enforce "fairness" and "rate
control" among competiting proceses initiating such write requests.
- 2006-05-23: Google publishes accepted/rejected projects. Go!
- 2006-06-26: Google solicits mid-program mentor evaluations of
- 2006-06-30: All mid-program evaluations of student progress due
by 17.00 Pacific Standard Time
- 2006-08-21: All student projects due by 08.00 Pacific Standard
- 2006-09-05: All mentor and student evaluations due by 08.00
Pacific Standard Time
Mandatory (must-have) components:
- Congestion Control Algorithm impelemented inside the UVM.
- Benchmarking and validation of the approach.
- Documentation highlighting the algorithm and effectiveness of the
Optional (would-be-nice) components:
- Currently, process information is kept on a link list, which has
O(N) traversal. We would like to have a splay tree version where the
most accessible nodes are always close to the root.
- As of now, the algorithm does not receive any feedback from the
underlying I/O system. We would like to create a version of the CCA
that self-tunes itself based on information from the underlying
- The Congestion Control Algorithm (CCA) is available as a hook
inside the UVM
Specifically, we trap page request inside the
function call. All processes that invoke this function, has its
requests recorded by the congestion control module. As of now, all
statistical information about processes are kept in a linked list. This
means that with the increase in the number of processes issuing write
requests, we suffer from linear order traversal of the list. In the
future, it is envisioned that a more efficient data structure, such as
a splay tree, will be used, so that processes invoking more frequest
writes, stays close to the root of the tree.
- The CCA algorithm is based on coupling two observable behavior of
processes: (i) the entropy or the amount of randomness present in the
process (this treats the processes as independent random variables),
and (ii) the rate of growth of writes being issued by a process.
In order to reduce computation overhead, we record the above parameters
every 2 seconds of process activity. A process is marked as being
in "congestion inducing phase" if the rate of growth of writes
(predicted) is less than what is being actually observed. Under such a
circumstance, the process is put to sleep for a variable period of
time. (In the current release, we have hard-coded this figure to 8secs
as we are trying to tune various aspects of the algorithm). Typical
trace behavior of such an activity can be found here.
Caveat: Code released has not
been exhaustively tested though reasonable care has been taken to
ensure that that the implementation will not cause undesirable affects.
In some instances, it has been observed, during heavy write events,
that the CCA algorithm makes the system non-responsive to remote login
via ssh and in two instances caused the
ioflush daemon to
pause for long intervals of time. Also, in the current
release, the rate of read requests by processes are also getting
affected while slowing down the rate of generation of write requests.
All these are under investigation and will hopefully be addressed in
Instructions on how to run the code
can be found here. Initial
results using the approach can be found here.
| Sumantra R. Kundu <email@example.com>
| $Id: index.html,v 220.127.116.11 2006/05/23 10:44:18 hubertf