I have to do an integration for an exam at my university, so my teacher says that I’ll have to study three Unix I/O multiplexing functions, namely the famous select(), poll(), and kqueue() and that I’ll have to think about the comparison between the first two and the kqueue function.
Actually only the first two are pure I/O multiplexing functions, the third is a more generic interface for the notification of events (that might include an event appening to a file descriptor as well as to other source).
This post is intended to be an introduction to the main choices when you have to deal with various file descriptors at the same time ( i.e I/O multiplexing ) and an introduction of the various I/O models that Unix brings to us. The comparison between select(), poll() and kqueue() will be in the second part of this post.
This post assumes that the reader have a basic understanding of a Unix system along with the concepts of system interface, system call, file descriptor.
As in Unix everything is a file the kernel provides us the astraction of file descriptor ( which is only a non negative integer, above all ) and this tiny number can be a socket, a pure file, or a pipe. But, assume that you are writing a server (or a client) program, you have to deal with a list of file descriptors, and the kernel have to signal you when a particular event occurs to that descriptors. This is when the I/O multiplexing comes out to help the user.
The functions select() and poll() (standard in POSIX 1003.g, for that matter
) are practically identical in what they do, they tell the kernel which file descriptors are interested in and for what (reading? writing? exceptions?), also they tell how much time the process will stop before the function have to return. For further considerations about the differences between those functions see here, also check the man page for select() and poll(). Those two functions tells the user that there’s some activity on the file descriptors that are listed in the arguments; so, as the user have this information (and also the type of activity) he can proceed to call the usual functions that read/write to file descriptors.
Thus, with this model of input/output (the multiplexing one), the user have to call at least two functions, the select (or poll) and the function that performs the read/write.
Instead with the most prevalent model of I/O in Unix, the blocking I/O model, all that you have to call is the function that perform the action you’re interested in (e.g recvfrom() to receive a message from a socket). With this kind of model the system call does not return until there is the actual data that you’re waiting, and after the data is available the kernel makes a copy from the kernel space to the user space. Only after all this operations the system call returns.
Conversely the nonblocking I/O model behaves in the opposite way. The book Unix network programming volume 1 says: ”When an I/O operation that I request cannot be completed without putting the process to sleep, do not put the process to sleep, but return an error instead”. The problem that points out well the book Advanced programming in the Unix environment (Apue in short) is that this kind of model wastes a lot of cpu time because the code must provide a loop to control the state of the descriptors.
The blocking, non blocking and multiplexing I/O models are all synchronous form of notification that something happened on descriptors.
The last I/O model that I want to describe is the asynchronous model, either with specific real-time functions (such as aio_read) or through the use of an explicit signal to notify something (actually the specific functions are implemented to deliver signals). Those two sub-types of asynchronous I/O models are pretty different with those listed above because the function that perform the I/O request returns immediately. If the file descriptor is not ready for the operation that we have in mind the kernel notify us later with a signal. The issue with this model is that a process can have only one signal (Apue). To make matters worse not all systems support this feature (it is an optional facility in the Single UNIX Specification) (Apue) .
In the second part of this two series post I try to compair the I/O multiplexing functions select() and poll() with the more efficient, and more complex, kqueue() function.
Here are some resources that I’ve consulted for the post:
Advanced Programming in the UNIX Environment, second edition, authors W. Richard Stevens, Stephen A. Rago, Addison-Wesley.
UNIX Network Programming The Sockets Networking API, volume 1, third edition, Addison-Wesley.
http://julipedia.meroh.net/2004/10/example-of-kqueue.html

