dynamic context transitions

Sat Dec 4 02:17:51 GMT 2004

Chris,

 > I've been asking about this in different places.  I've heard theories, 
 > mostly.  This is happening in Linux (dunno if it's been tested elsewhere) 
 > and one theory is that the forked process speeds are good because Linux 
 > basically does a really good job with those.  Meanwhile, thread speed is 
 > bad because the multiple threads are all within a single process and the 
 > single process gets only it's own share of processor time.

Processes are faster than threads on all OSes that I have tested on
(that includes Solaris, IRIX, AIX and Linux). The difference is most
dramatic on the "traditional" unixes where threads _really_ suck
badly, despite all the hype. On Linux with the latest 2.6 and glibc
threads have almost caught up with processes, but still lag behind by
a little.

I've often heard people say things like "threads are fast on solaris"
or "threads are fast on AIX". It's not true. They are slow as hell on
both. 

now some explanation as to _why_ this is the case.

On all modern unixes threads and processes are basically the same
thing. The principle difference is that in threads memory is shared by
default, and you have to do extra work to set it up as non-shared,
whereas with processes memory is not shared by default and you have to
do extra work to make it shared. Both systems have the same
fundamental capabilities, its just the defaults that change.

Now to the interesting bit. Because memory is shared by default, the C
library has to assume that memory that it is working with is shared if
you are using threads. That means it must add lock/unlock pairs around
lots of internal code. If you don't use threads then the C library
assumes that the programmer is smart enough to put locks on their own
shared memory if they need them.

Put another way, with processes you are using the hardware memory
protection tables to do all the hard work, and that is essentially
free. With threads the C library has to do all that work itself, and
that is _slow_. 

With the latest glibc and kernel this problem has been reduced on
Linux by some really smart locking techniques. It is an impressive
piece of work, and means that for Linux threads now suck less than
they do on other platforms, but they are still not faster than
processes.

So why do some people bother with threads? It's is for convenience. It
makes some types of programming easier, but it does _not_ make it
faster. The "threads are fast" meme is a complete fallacy, much like
the common meme of CPUs running faster for in-kernel code.

What is true is that on almost all platforms _creating_ a thread is
cheaper than creating a process. That can matter for some applications
where the work to be done take a very few cycles (like spawn-thread,
add two numbers, then kill thread). Thread benchmarks tend to be in
this category. File servers are not.

For a file server you generally want your unit of processing to last
for seconds to hours or days. In that case the few nano-seconds saved
in the thread creation is not relevant.

The other big thing that is bad about threads is that the designers of
the thread APIs (like pthreads) did not consider file servers to be
important, so they completely screwed up on several aspects of the
API, so that the convenience of using threads is totally lost. A good
example is the way threads interact with byte range locks. It is
impossible for one thread to "lock" a byte range such that another
thread can see the lock. 

Most of these API deficiencies could be fixed by making
pthread_create() have an option on Linux to not pass CLONE_FILES or
CLONE_FS to the clone() system call. If that was done then threads
would start being a lot more palatable for file servers.

Cheers, Tridge