BBC BASIC for Windows
« Asm blues »

Welcome Guest. Please Login or Register.
Apr 5th, 2018, 10:29pm



ATTENTION MEMBERS: Conforums will be closing it doors and discontinuing its service on April 15, 2018.
Ad-Free has been deactivated. Outstanding Ad-Free credits will be reimbursed to respective payment methods.

If you require a dump of the post on your message board, please come to the support board and request it.


Thank you Conforums members.

BBC BASIC for Windows Resources
Online BBC BASIC for Windows documentation
BBC BASIC for Windows Beginners' Tutorial
BBC BASIC Home Page
BBC BASIC on Rosetta Code
BBC BASIC discussion group
BBC BASIC for Windows Programmers' Reference

« Previous Topic | Next Topic »
Pages: 1  Notify Send Topic Print
 thread  Author  Topic: Asm blues  (Read 855 times)
David Williams
Developer

member is offline

Avatar

meh


PM

Gender: Male
Posts: 452
xx Asm blues
« Thread started on: Jul 22nd, 2010, 1:47pm »

Amazing.

You spend hours writing a new routine to replace an older one. The new one has far fewer branches (therefore fewer supposedly costly branch mispredicts), is more cache-friendly (therefore less main memory access), uses mostly local variables (stored on stack), and attention paid to instruction pairing, and yet it runs significantly slower than the branch-ridden, global variable-infested, cache-thrashing piece of c*** that you wanted to replace.

I'm thinking maybe a better strategy is to code a routine as best as one can in C, and then hope that the compiler can produce more efficient code than one can via totally hand-coded ASM.


David.

PS. Yes, I did bear in mind code-data proximity (4Kb gap either side of code block), and ensured that all DWORDs were loaded from or stored to DWORD-aligned addresses.
User IP Logged

admin
Administrator
ImageImageImageImageImage


member is offline

Avatar




PM


Posts: 1145
xx Re: Asm blues
« Reply #1 on: Jul 22nd, 2010, 3:52pm »

on Jul 22nd, 2010, 1:47pm, David Williams wrote:
it runs significantly slower than the branch-ridden, global variable-infested, cache-thrashing piece of c*** that you wanted to replace.

Modern compilers have really good code generators. If you can express an algorithm concisely and elegantly in C, you'll often find it difficult to improve on the assembler code generated by the compiler (in terms of performance, not appearance!).

One thing compilers are really good at is dividing by a constant. They will almost invariably convert this to a multiplication by the 'reciprocal' followed by a shift, which is much faster.

On the other hand if the algorithm is 'ugly' in C you have a much better chance of doing better yourself. Classic examples are things like multiplications and divisions when you don't want to lose any precision. In a machine-code division you can get the quotient and the remainder in one instruction, but there's no way of expressing this concisely in C and the compiler may not notice that's what you want.

Similarly the 'natural' form of a machine-code multiplication generates a product with more bits than the multiplicands (e.g. multiplying two 16-bit numbers gives a 32-bit result) and again there's no elegant way of expressing that in C. You may end up promoting the multiplicands to 32-bits before performing the multiplication.

So it's horses for courses, as always.

Richard.
User IP Logged

Pages: 1  Notify Send Topic Print
« Previous Topic | Next Topic »

| |

This forum powered for FREE by Conforums ©
Terms of Service | Privacy Policy | Conforums Support | Parental Controls