SourceXR

C/C++ Cross-Reference Tool

Lightweight Inter-Process Signaling with eventfd

The eventfd syscall can be used as a drop-in replacement of pipe(), when it is merely used by two processes to synchronize themselves. We'll describe in this article how eventfd can be used to simplify process synchronization and how a new flag can be used as another way to implement semaphores with eventfd.

Synchronization with pipe()

The following little program shows how pipe() is used to perform IPC between two related processes. The parent forks and simply waits for an event from its child.

We first create the pipe, then the parent forks. Then both processes close their respective end of the pipe, create an epoll reactor, and register the file descriptor. Finally they enter their main loop. The parent waits for its child whereas the child enables write events on the pipe and uses it to write a single character to its parent.

There is nothing subtle here. We only use the pipe2 syscall that allows us to enable the non-blocking flag on the created pipe in a single call.

    // (1) Init
    int fd[2];
    int r = pipe2 (fd, O_NONBLOCK);
    if (r == -1) {
        std::cerr << strerror (errno) << "\n";
        return 1;
    }

After having created the pipe, we fork and the parent process enters the parent_loop ():

        // (2) Close useless pipe end
        close (fd[1]);

        int epollfd = create_reactor ();
        if (epollfd == -1) {
            close (fd[0]);
            return 1;
        }
        if (add_pipe_to_reactor (epollfd, fd[0]) == -1) {
            return 1;
        }

        // watch possibly other fds
        // ...

        parent_loop (epollfd, fd[0]);

And the child process enters the child_loop ():

    if (pid == 0) { // child

        // (2) Close useless pipe end
        close (fd[0]);

        int epollfd = create_reactor ();
        if (epollfd == -1) {
            close (fd[1]);
            return 1;
        }
        if (add_pipe_to_reactor (epollfd, fd[1]) == -1) {
            return 1;
        }

        // child writes msg to parent to signal it
        epoll_event event;
        event.events = EPOLLIN | EPOLLOUT;
        event.data.fd = fd[1];
        int r = epoll_ctl (epollfd, EPOLL_CTL_MOD, fd[1], &event);
        if (r == -1) {
            std::cerr << strerror (errno) << "\n";
            close (epollfd);
            close (fd[1]);
            return 1;
        }

        child_loop (epollfd, fd[1]);

    }

The parent loop merely waits for events using the epoll mechanism and prints a message when it receives the child notification:

void parent_loop (int epollfd, int fd) {

    while (true) {
        const int infinity = -1;
        // and wait for events
        epoll_event events[size];
        int r = epoll_wait (epollfd, events, size, infinity);
        if (r == -1) {
            close (epollfd);
            close (fd);
            exit (1);
        }

        // demultiplex events
        int i = 0;
        while (i < r) {
            if (events[i].data.fd == fd) {
                // (4) Process pending notification
                const size_t s = 32;
                char buffer[s];
                int i = read (fd, buffer, s);
                if (i != -1) {
                    std::cout << "Received msg from child\n";
                }
            }
            ++i;
        }
    }
    close (epollfd);
    close (fd);
}

The client loop writes to the other end of the pipe to signal the parent process:

void child_loop (int epollfd, int fd) {

    bool readyToWrite = true;
    while (true) {
        const int infinity = -1;
        // and wait for events
        epoll_event events[size];
        int r = epoll_wait (epollfd, events, size, infinity);
        if (r == -1) {
            close (epollfd);
            close (fd);
            exit (1);
        }
        // demultiplex events
        int i = 0;
        while (i < r) {
            if (events[i].data.fd == fd) {
                if (events[i].events & EPOLLOUT) {
                    if (readyToWrite) {
                        // (3) Notifies parent
                        char buffer = 0;
                        int i = write (fd, &buffer, 1);
                        if (i != -1) {
                            std::cout << "Signaled parent\n";
                        }
                        else {
                            std::cerr << "write: " << strerror (errno) << "\n";
                            close (epollfd);
                            close (fd);
                            exit (1);
                        }
                        // unregisters from reactor
                        readyToWrite = false;
                        epoll_event event;
                        event.events = EPOLLIN;
                        event.data.fd = fd;
                        int r = epoll_ctl (epollfd, EPOLL_CTL_MOD, fd, &event);
                        if (r == -1) {
                            std::cerr << "epoll: " << strerror (errno) << "\n";
                            close (epollfd);
                            close (fd);
                            exit (1);
                        }
                    }
                    else {
                        std::cerr << "Asked for write but nothing to write!\n";
                    }
                }
            }
            ++i;
        }
    }
}

eventfd

eventfd is a quite recent syscall that appeared in Linux 2.6.22. It creates an eventfd object that can be used to perform inter-process communication like pipe() of the previous example.

Like pipe, to communicate, read and write operations are performed on the eventfd.

It is initialized with a value that is changed upon reading and writing to the eventfd: set to 0 when the value is read, and incremented by the supplied value when it is written to.

If the value is 0, a read will block (or return EAGAIN in non-blocking mode), conversely if the value is already u64_max - 1 a write operation will block.

Its main advantage is that only one file descriptor is used (two for pipe(2)) and it is more lightweight than pipe.

Its use is very similar to the pipe call. We just replace the functions from the previous program with the new ones (and add the sys/eventfd.h include file).

First, the creation of the eventfd object (to go into block (1) in the above code):

unsigned int val = 0;
int fd = eventfd (val, O_NONBLOCK); // new in 2.6.27
if (fd == -1) {
    std::cerr << strerror (errno) << "\n";
    return 1;
}

The blocks (2) are no longer needed.

Then, the child uses the following to signal the parent (block (3)):

uint64_t value = 1;
int i = write (fd, &value, sizeof (value));

Whereas the parent uses the following code to get the signal (block (4)):

uint64_t value;
int i = read (fd, &value, sizeof (value));

eventfd as Semaphore

When you read the manpage of eventfd(2), you will notice that 0 is a specific value which causes read() on the event file descriptor to block or conversely a write() operation to block. This behavior can be used to have eventfd act as a semaphore.

Remember the two semaphores operations: post() and wait(). For a simple semaphore with a count of 1, the eventfd object is initialized with a value of 1, and as we want to be blocked, no flag is used in the eventfd constructor.

Then, a read operation replaces the value by a zero (therefore blocking any other reader), and the corresponding write operation replaces the value of the eventfd with a new value, waking up blocked reader.

If we were to summarize these operations as a simple mutex class its implementation could be (error handling omitted):

class EventFDMutex
{

public:
    EventFDMutex() {
        _fd = eventfd (1, 0);
    }

    ~EventFDMutex() {
        close (_fd);
    }

    void lock () {
        uint64_t value;
        read (_fd, &value, sizeof (value));
    }

    void unlock () {
        uint64_t value = 1;
        write (_fd, &value, sizeof (value));
    }

private:
    EventFDMutex (const EventFDMutex &);
    EventFDMutex &operator= (const EventFDMutex &);

    int _fd;
};

In order to be used as a sempahore with many resources, an additional flag in the eventfd syscall is used: EFD_SEMAPHORE. In that case, a read operation no longer replaces the eventfd value by 0, but rather decrements its value by 1. Therefore, if the eventfd is initialized by the number of resources, eventfd can be used to implement semaphores.

Conclusion

eventfd is a very straightforward replacement for pipe. It can also be used as mutex/semaphore but you may want to bench its use as it requires a syscall to perform lock and unlock operations and thus much more time than the usual compare-and-set assembly operations used in mutexes usual implementation.

eventfd described here is used in related processes (parent forks child), but the eventfd descriptor can be passed to another process like any file descriptor using UNIX sockets.

Implementation

The source file for the pipe implementation can be found here.

Comments !