We study the processes in Linux
In this article, I would like to talk about the life path of processes in the Linux family. In theory and examples I will look at how processes are born and die, I will tell a little about the mechanics of system calls and signals.
This article is mostly designed for beginners in system programming and those who just want to learn a little more about how the processes work in Linux.
Introduction
The attributes of the process are
The life cycle of the process
The birth of the process
The status is "ready"
The state is "running"
Rebirth in another program
The state "expects"
The state is "stopped"
Completion of the process
The state of the "zombie"
Forgetting
Acknowledgments
Include /linux /sched.h ).
But since the article is devoted to system programming, and not to the development of the kernel, we abstract a little and just focus on the process fields important to us:
The process identifier (pid)
Open file descriptors (fd)
Signal handlers
Current working directory (cwd)
Environment variables (enivron)
The return code is
The life cycle of the process is

The birth of the process
Only one process in the system is born in a special way -
init
- it is generated directly by the nucleus. All other processes appear by duplicating the current process using the system call. fork (2)
. After the execution of fork (2)
we obtain two practically identical processes with the exception of the following items:-
-
fork (2)
returns the child's PID to the parent, the child returns 0; - The child changes the PPID (Parent Process Id) to the PID of the parent.
After the execution of
fork (2)
all resources of the child process are a copy of the parent's resources. Copy the process with all the allocated pages of memory - it's expensive, so the Linux kernel uses Copy-On-Write technology.All memory pages of the parent are marked as read-only and become available to both the parent and the child. As soon as one of the processes changes the data on a certain page, this page does not change, but a copy is copied and changed. The original is "untied" from this process. As soon as the read-only original remains "tied" to one process, the page is again assigned the status of read-write.
#include
#include
#include
#include
#include
int main () {
int pid = fork ();
switch (pid) {
case -1:
perror ("fork");
return -1;
case 0:
//Child
printf ("my pid =% i, returned pid =% in", getpid (), pid);
break;
default:
//Parent
printf ("my pid =% i, returned pid =% in", getpid (), pid);
break;
}
return 0;
}
$ gcc test.c && ./a.out
my pid = 1559? returned pid = 15595
my pid = 1559? returned pid = 0
The state is "ready"
Immediately after the execution of
fork (2)
goes into the "ready" state.In fact, the process is waiting in line and waiting for the scheduler in the kernel to let the process run on the processor.
The state is "running"
Once the scheduler has put the process to execution, the "run" state has started. The process can be performed all the proposed interval (quantum) of time, and can give way to other processes, using system export
sched_yield
.Rebirth in another program
Some programs implement logic in which the parent process creates a child to solve a task. The child in this case solves a specific problem, and the parent only delegates tasks to his children. For example, a web server with an incoming connection creates a child and passes the connection processing to it.
However, if you need to start another program, you need to resort to the
system call. execve (2)
: int execve (const char * filename, char * const argv[], char * const envp[]);
or library calls
execl (3), execlp (3), execle (3), execv (3), execvp (3), execvpe (3)
: int execl (const char * path, const char * arg, /* (char *) NULL * /);
int execlp (const char * file, const char * arg, /* (char *) NULL * /);
int execle (const char * path, const char * arg,
/*, (char *) NULL, char * const envp[]* /);
int exec (const char * path, char * const argv[]);
int execvp (const char * file, char * const argv[]);
int execvpe (const char * file, char * const argv[], char * const envp[]);
All of the above calls execute the program, the path to which is specified in the first argument. In case of success, control is transferred to the downloaded program and is not returned to the original program. In this case, the downloaded program has all the fields of the process structure, except file descriptors marked as
O_CLOEXEC
, they will close.How not to be confused in all these challenges and choose the right one? It's enough to understand the naming logic:
-
- All calls begin with
exec
- The fifth letter determines the type of argument transfer:
-
- l is list , all parameters are transmitted as
arg? arg? , NULL
- v is vector , all parameters are transferred in a null-terminated array;
-
- Then the letter can follow. p , which is path . If the argument is
file
begins with a character other than "/", then the specifiedfile
is searched in the directories listed in the environment variablePATH
- The last can be the letter e , denoting enivron . In such calls, the last argument is a null-terminated array of null-terminated rows of the form
key = value
- environment variables that will be passed to the new program.
#define _GNU_SOURCE
#include
int main () {
char * args[]= {"/bin /cat", "--help", NULL};
execve ("/bin /cat", args, environ);
//Unreachable
return 1;
}
$ gcc test.c && ./a.out
Usage: /bin /cat[OPTION] [FILE]
Concatenate FILE (s) to standard output.
* The output is cut off *
The family of calls
exec *
allows you to run scripts with execution rights and starting with a sequence of Shebang (#!). #define _GNU_SOURCE
#include
int main () {
char * e[]= {"PATH = /habr: /rulez", NULL};
execle ("/tmp /test.sh", "test.sh", NULL, e);
//Unreachable
return 1;
}
$ cat test.sh
#! /bin /bash
echo $ 0
echo $ PATH
$ gcc test.c && ./a.out
/tmp/test.sh
/habr: /rulez
There is an agreement that implies that argv[0]matches the zero argument for the functions of the exec * family. However, this can be broken.
#define _GNU_SOURCE
#include
int main () {
execlp ("cat", "dog", "--help", NULL);
//Unreachable
return 1;
}
$ gcc test.c && ./a.out
Usage: dog[OPTION] [FILE]
* The output is cut off *
An interesting reader may notice that in the signature of the function
int main (int argc, char * argv[])
there is a number - the number of arguments, but in the family of functions exec *
nothing of this is transmitted. Why? Because when you run the program, control is not transferred immediately to main. Before this, some actions are performed, defined by glibc, including the calculation of argc.The state "expects"
Some system calls can take a long time, for example, I /O. In such cases, the process goes into the "wait" state. As soon as the system call is executed, the kernel will transfer the process to the "ready" state.
In Linux, there is also a "wait" state in which the process does not respond to interrupt signals. In that the process becomes "non-destructible", and all the incoming signals appear in the queue until the process exits this state.
The kernel itself chooses which of the states to translate the process. Most often, the state "expects (without interrupts)" includes processes that request I /O. This is especially noticeable when using a remote disk (NFS) with not very fast Internet.
The state is "stopped"
At any time, you can pause the process by sending it a SIGSTOP signal. The process will go to the "stopped" state and will remain there until it receives a signal to continue working (SIGCONT) or die (SIGKILL). The remaining signals will be queued.
Completion of the process
No program can complete itself. They can only ask the system for this by using the
system call. _exit
or be terminated by the system due to an error. Even when you return a number from main ()
, is still implicitly called. _exit
.Although the system call argument takes an int value, only the low-order byte of the number is taken as the return code.
The state of the "zombie"
Immediately after the process is completed (no matter, correctly or not), the kernel writes information about how the process ended and translates its state of "zombies". In other words, a zombie is a completed process, but the memory of it is still stored in the nucleus.
Moreover, this is the second state in which the process can safely ignore the SIGKILL signal, because the dead can not die again.
The Forgotten
The return code and the reason for the completion of the process are still stored in the kernel and must be retrieved from there. To do this, you can use the corresponding system calls:
pid_t wait (int * wstatus); /* Similarly, waitpid (-? wstatus, 0) * /
pid_t waitpid (pid_t pid, int * wstatus, int options);
All information about the completion of the process gets into the data type int. To obtain the return code and the reason for the completion of the program, the macros described in the man page
are used. waitpid (2)
. #include
#include
#include
#include
#include
int main () {
int pid = fork ();
switch (pid) {
case -1:
perror ("fork");
return -1;
case 0:
//Child
return 13;
default: {
//Parent
int status;
waitpid (pid, & status, 0);
printf ("exit normally?% sn", (WIFEXITED (status)? "true": "false")));
printf ("child exitcode =% in", WEXITSTATUS (status));
break;
}
}
return 0;
}
$ gcc test.c && ./a.out
exit normally? true
child exitcode = 13
The transfer of argv is[0]as NULL leads to a fall.
#include
#include
#include
#include
#include
int main () {
int pid = fork ();
switch (pid) {
case -1:
perror ("fork");
return -1;
case 0:
//Child
execl ("/bin /cat", NULL);
return 13;
default: {
//Parent
int status;
waitpid (pid, & status, 0);
if (WIFEXITED (status)) {
printf ("Exit normally with code% in", WEXITSTATUS (status));
}
if (WIFSIGNALED (status)) {
printf ("killed with signal% in", WTERMSIG (status));
}
break;
}
}
return 0;
}
$ gcc test.c && ./a.out
killed with signal 6
There are cases in which the parent ends earlier than the child. In such cases, the parent of the child will be
init
and he will apply the call wait (2)
, when the time comes.After the parent has taken away information about the death of the child, the kernel erases all information about the child, so that another process will soon take its place.
Acknowledgments
Thanks to Sasha "Al" for the editorial staff and assistance in the design;
Thanks to Sasha "Reisse" for clear answers to difficult questions.
They firmly endured the inspiration that came on me and the storm of my questions that attacked them.
It may be interesting
weber
Author16-09-2018, 22:26
Publication DateSystem Programming / Development under Linux
Category- Comments: 0
- Views: 375