There are some limitations to using non-MMU CPUs in *nix, e.g: there are restrictions to
fork()
, mmap()
, shmat()
and brk()
. Let's talk about fork()
.fork(2)
creates a child process with a copy of the memory of the current process (and a copy of the file descriptors, signal handlers, filesystem namespace, ...) This copy has the same virtual addresses as those used on the parent. In a CPU with a MMU, the MMU translates each process' virtual addresses to different physical addresses, so everything works and everyone is happy.However, in a MMU-less CPU, virtual addresses are the same as physical addresses (said another way, there are no virtual addresses), so
fork()
cannot work in a MMU-less system in the general case (at least in an efficient manner, you can always move processes in memory at each context switch).Often,
fork(2)
is used to immediately call execve(2)
in the child. There is a special system call fot this: vfork(2)
. Typically, vfork()
doesn't create a new copy of the parent's memory, but uses the parent's memory. It's also typical for the parent to remain blocked until the child calls execve()
or _exit()
.The only safe things you can do after
vfork()
on the child are the following:- Calling
execve()
. - Calling
_exit()
(Note it's_exit()
, notexit()
,exit()
can run C library finalization code, such as closing and freeing file handles, which invfork()
implementations using the parent's memory would also close and free them for the parent, leading to very bad things). - Use the
pid_t
value returned byvfork()
.
Of course,
vfork()
can be implemented simply as:#define vfork fork
As
vfork()
uses a shared address space, it works perfectly fine on non-MMU CPUs. Also, creating a child to immediately call execve()
is a very common use of fork()/vfork()
.The other *nix classical API to create processes/threads/tasks is
pthread_create()
. As the different threads share the memory address space, this works for non-MMU CPUs. POSIX also introduces a posix_spawn()
function.In the specific case of Linux, there is also
clone(2)
. In non-MMU CPUs, clone()
works fine if it's passed the CLONE_VM
flag.An interesting detail in
vfork()
(explained by Jamie Loker at uclinux-dev at http://www.mail-archive.com/uclinux-dev@uclinux.org/msg01290.html) is how it's implemented in uClibc:
__vfork:
popl %ecx
movl $__NR_vfork,%eax
int $0x80
pushl %ecx
cmpl $-4095,%eax
jae __syscall_error
ret
When you call
vfork()
, Linux first returns control to the child. The parent hasn't yet returned from vfork()
. The call to execve()
in the child can corrupt vfork()
's stack frame in the parent.The solution is not depending on
vfork()
's stack frame. In the previous i386 example, the first thing that is done is save the return address (which is the only think saved on vfork()
's stack frame, as vfork()
has neither parameters nor local variables) in a register, where it is safe. The int $0x80
instruction is the one to pass control to sys_vfork()
at the kernel. On return from sys_vfork()
, we push the return address into the stack frame again, check for errors, and return from vfork()
.(Originally published at http://barrapunto.com/~ninjalj/journal/27731 (in Spanish))
No comments:
Post a Comment