Parandrus

Zombie processes in Unix

September 18, 2021

What are zombie processes, where do they come from and how do we get rid of them?

To understand how a process can become a zombie, we need to first understand how new processes are created in a Unix operating system and how processes relate to each other.

Parent-child Relationship of Processes

To create a new process you fork(2) or clone(2) the current process, which becomes the parent of the new child process. This parent-child relationship between processes can be traced all the way up to the root, which is the process with id 1, typically /sbin/init, although on macOS this is /bin/launchctl.

We can see this in the output of ps -ax -o pid,ppid,cmd:

$ ps -ax -o pid,ppid,cmd
PID    PPID CMD
  1       0 /sbin/init
...
297       1 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
...
615     297 sshd: julien [priv]
641     615 sshd: julien@pts/0
642     641 -bash
793     642 ps -ax -o pid,ppid,cmd

You can trace the ps command with process id 793 to its parent bash 642, to sshd 641 and so on until init 1.

Creating a Zombie

Now that we know how a process is created and who its parent is, we can look into happens when a process exits.

When a process exits its resources are cleaned up by the kernel, but the process table entry is kept until the process’s parent reads the termination status via the wait(2) system call. Such a process that has exited but is still present in the process table is called a zombie process.

From the exit(3) man page:

[…] the child becomes a “zombie” process: most of the process resources are recycled, but a slot containing minimal information about the child process (termination status, resource usage statistics) is retained in process table. This allows the parent to subsequently use waitpid(2) (or similar) to learn the termination status of the child; at that point the zombie process slot is released.

If we look at the running processes while the following C program runs we can see this in action.

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/wait.h>

int main(void)
{
    int stat;
    pid_t pid, wait_pid;

    if ((pid = fork()) < 0)
    {
        perror("fork error");
        exit(1);
    }
    else if (pid == 0)
    {
        /* Child: wait 3 seconds and then exit */
        sleep(3);
        exit(0);
    }
    else
    {
        /* Parent: wait 5 seconds before calling wait. After 3 seconds the child will become a zombie */
        sleep(5);
        if ((wait_pid = wait(&stat)) < 0)
        {
            perror("wait error");
            exit(1);
        }
        /* the child is now removed from the process table */
        sleep(5);
    }
}

At first we can see the parent (5084) and child (5085) processes running, or rather sleeping as indicating by the S in the ps output:

$ ./zombie &
[1] 5084
$ ps -ax -o pid,ppid,state,cmd
   5084     642 S ./zombie
   5085    5084 S ./zombie

Once the child processes exits it is still present in the process table, but now in the Z state. This is also reflected its name [zombie] <defunct>:

$ ps -ax -o pid,ppid,state,cmd
   5084     642 S ./zombie
   5085    5084 Z [zombie] <defunct>

Finally, after the parent processes calls wait, the child process is removed from the process table:

$ ps -ax -o pid,ppid,state,cmd
   5084     642 S ./zombie

Reparenting

If we remove the wait call from the parent process and exit from the parent process before the child exits, pid 1 aka init will become the new parent. One of the jobs of init is to always calls wait when one of its children terminates. This ensures that zombies are eventually cleaned up, even if a program doesn’t wait on its child processes before terminating.


© 2023