BBC BASIC for Windows: Asm blues

BBC BASIC for Windows
Programming >> Assembly Language Programming >> Asm blues
http://bb4w.conforums.com/index.cgi?board=assembler&action=display&num=1279806479

Asm blues
Post by David Williams on Jul 22^nd, 2010, 1:47pm

Amazing.

You spend hours writing a new routine to replace an older one. The new one has far fewer branches (therefore fewer supposedly costly branch mispredicts), is more cache-friendly (therefore less main memory access), uses mostly local variables (stored on stack), and attention paid to instruction pairing, and yet it runs significantly slower than the branch-ridden, global variable-infested, cache-thrashing piece of c*** that you wanted to replace.

I'm thinking maybe a better strategy is to code a routine as best as one can in C, and then hope that the compiler can produce more efficient code than one can via totally hand-coded ASM.

David.

PS. Yes, I did bear in mind code-data proximity (4Kb gap either side of code block), and ensured that all DWORDs were loaded from or stored to DWORD-aligned addresses.

Re: Asm blues
Post by admin on Jul 22^nd, 2010, 3:52pm

on Jul 22^nd, 2010, 1:47pm, David Williams wrote:

it runs significantly slower than the branch-ridden, global variable-infested, cache-thrashing piece of c*** that you wanted to replace.

Modern compilers have really good code generators. If you can express an algorithm concisely and elegantly in C, you'll often find it difficult to improve on the assembler code generated by the compiler (in terms of performance, not appearance!).

One thing compilers are really good at is dividing by a constant. They will almost invariably convert this to a multiplication by the 'reciprocal' followed by a shift, which is much faster.

On the other hand if the algorithm is 'ugly' in C you have a much better chance of doing better yourself. Classic examples are things like multiplications and divisions when you don't want to lose any precision. In a machine-code division you can get the quotient and the remainder in one instruction, but there's no way of expressing this concisely in C and the compiler may not notice that's what you want.

Similarly the 'natural' form of a machine-code multiplication generates a product with more bits than the multiplicands (e.g. multiplying two 16-bit numbers gives a 32-bit result) and again there's no elegant way of expressing that in C. You may end up promoting the multiplicands to 32-bits before performing the multiplication.

So it's horses for courses, as always.

Richard.