[clug] assembly performance

Alex Satrapa grail at goldweb.com.au
Fri Jun 10 04:46:41 GMT 2005

On 10 Jun 2005, at 13:13, Adrian Blake wrote:

> In assembly code on the modern micros are we addressing and  
> manipulating the silicon or just talking to a virtual processor ?

 From the point of view of the assembly programmer, it makes no  
difference. Instruction X takes t(X) cycles to execute, and consumes s 
(X) bytes or words of RAM.

> If assembly code operates the silicon the performance is dependent  
> upon the skills of the assembly coder. But if the assembly code is  
> interpreted by a lower level code that simulates a virtual machine  
> then the performance can vary and the user is at the mercy of the  
> microcoder.

I don't see how the performance can vary when the same instruction X  
is run exactly the same way by the microcode (the pipeline  
architecture notwithstanding). The only issue I can see is that the  
microcode might be some fraction or order of magnitude different in  
efficiency to some "ideal machine" where all instructions take 1  
clock cycle to execute.

IIRC, some of the modern Intel CPUs allow you to load new microcode -  
so you can go and totally mess up your processor as much as you want =)

What's the real issue here? C compilers generally tend to produce  
better optimised code than a human can manage (because you have only  
your own lifetime's experience, while GCC has about 70 people's  
lifetime experiences combined, so it knows more tricks than you do).  
Microcode is a tradeoff against the fact that you have certain  
physical limits (physics and chemistry here) about how many  
transistors you can pack into a chip. Signals can only travel so far  
across a chip in one clock cycle, gates take a certain amount of time  
to switch states, etc. Rather than use 1,000,000 transistors to build  
the required combinational logic, use 100,000 to build a microcode  
interpreter, and another 100,000 to store the microcode programming,  
and suffer the loss of a clock cycle (or three) per instruction.  
Using less space for your processor means you can fit more processors  
on each wafer of silicon, and thus have fewer losses due to impurities.


More information about the linux mailing list