Author |
Topic: Antialiased line drawing (a timed test) (Read 1547 times) |
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Antialiased line drawing (a timed test)
« Thread started on: Jan 31st, 2012, 10:43am » |
|
This morning I devised an antialiased line drawing routine for GFXLIB which uses only fixed-point arithmetic. It is intended to replace the existing "experimental" routine which makes very heavy use of FPU instructions.
To my surprise and slight disappointment, it turns out that the fixed-point version is only about 14% faster than the quite ghastly FPU-based version! I was honestly expecting something between 50 and 100% faster.
But 14% ?
Anyway, until it's deleted in the next few days, the timed test (EXE) can be downloaded from here:
http://www.bb4wgames.com/temp/timed_test.zip
DrawAntialiasedLine0 is the FPU-intensive routine.
DrawAntialiasedLine uses just fixed-point maths (and a single IDIV instruction).
Probably the last time I'll fret about making heavy use of the FPU in my Asm programs. 
David.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Antialiased line drawing (a timed test)
« Reply #1 on: Jan 31st, 2012, 12:07pm » |
|
on Jan 31st, 2012, 10:43am, David Williams wrote: On my PC it reports "The fixed point-based routine is 9.71% faster than the FPU-based one"!
Quote:uses just fixed-point maths (and a single IDIV instruction). |
|
IDIV is very slow. If it's in a loop and executed many times that may explain the poor performance.
Quote:Probably the last time I'll fret about making heavy use of the FPU in my Asm programs. |
|
Have you tried coding the floating-point version in SSE (or SSE2) and/or the integer version in MMX? One or other of those might do better.
How do the timings compare with non-antialiased lines? If they are not very different it may indicate that the time is dominated by plotting the pixels, which would imply there's little point trying to optimise the line-drawing calculations.
Richard.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Antialiased line drawing (a timed test)
« Reply #2 on: Jan 31st, 2012, 1:48pm » |
|
Here's what I get if I compare your results with the MMX-based line-drawing code I wrote a while ago. You didn't include the listing of your test program so I don't know precisely the coordinates you used; if it turns out my test harness is significantly different from yours then the comparison may be meaningless:
GFXLIB_DrawAntialiasedLine0 (FPU) took 5485 ms. GFXLIB_DrawAntialiasedLine (fixed-point) took 5031 ms. RTR's MMX-based line-drawing code took 1478 ms.
The code is here:
http://www.rtr.myzen.co.uk/DrawAntialiasedLineRTR.zip
Richard.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Antialiased line drawing (a timed test)
« Reply #3 on: Jan 31st, 2012, 7:52pm » |
|
on Jan 31st, 2012, 12:07pm, Richard Russell wrote:On my PC it reports "The fixed point-based routine is 9.71% faster than the FPU-based one"! |
|
From bad to worse then.
Quote:IDIV is very slow. If it's in a loop and executed many times that may explain the poor performance. |
|
In the fixed point-based code, IDIV is executed only once per line drawn.
In contrast, the ghastly FPU-based code, per line drawn, there are:
5x fidiv 1x fdivr 1x fsqrt 3x fmul
And in the point-plotting loop, per point plotted, there are 2 fimuls, and 2 fmuls!
In (partial) conclusion, the single IDIV instruction (per line drawn) in the fixed-point code almost certainly has no bearing on what appears to be the code's very poor performance.
Quote:Have you tried coding the floating-point version in SSE (or SSE2) and/or the integer version in MMX? One or other of those might do better. |
|
No, I haven't.
I consider getting the bog-standard non-SIMD, non-FPU version up and running something of an achievement!
Quote:How do the timings compare with non-antialiased lines? If they are not very different it may indicate that the time is dominated by plotting the pixels, which would imply there's little point trying to optimise the line-drawing calculations. |
|
Good point. I will have to check that.
David.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Antialiased line drawing (a timed test)
« Reply #4 on: Jan 31st, 2012, 10:47pm » |
|
on Jan 31st, 2012, 1:48pm, Richard Russell wrote:
Thanks for the source!
So, yours is approaching three times faster than mine (I made the necessary modifications to your test harness in order to make it more-or-less identical to mine).
Also, what I then did was to remove the call to the 'put subpixel' subroutines in the three versions, and got these timings for the line calculations alone:
DW (FPU): 1531 ms DW (Fixed-point): 1360 ms RTR (Fixed-point): 1094 ms
100,000 lines (identical sets of co-ordinates); 640x512 drawing area
Comparing our line calculation loops:
DW
Code: .plotYagainstX_loop%
push eax ; preserve X
push ebx ; preserve Y
sar eax, 8 ; EAX = X >> 8
sar ebx, 8 ; EBX = Y >> 8
push edx ; colour
push ebx ; Y >> 8
push eax ; X >> 8
push [ebp + 8] ; dispVars.bmBuffH%
push [ebp + 4] ; dispVars.bmBuffW%
push [ebp] ; dispVars.bmBuffAddr%
call GFXLIB_Plot2x2FilteredPoint%
pop ebx ; restore EBX (Y)
pop eax ; restore EAX (X)
add eax, edi ; X += step
add ebx, esi ; Y += m
cmp eax, ecx ; X <= x2 ?
jle plotYagainstX_loop%
RTR
Code: .loopx
call plotsubpixel
add edi,eax
add esi,&10000
cmp esi,ebx
jc loopx
I'm forced to do all that PUSHing and POPing. :-[
I strongly suspect that your line calculation code is faster than my pi**poor implementation of Bresenham's line drawing algorithm as employed in the standard GFXLIB_Line routine.
Which is why I'm tempted...
David.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Antialiased line drawing (a timed test)
« Reply #5 on: Jan 31st, 2012, 11:55pm » |
|
An earlier program modified to use RTR's antialiased line drawing routine (includes EXE and source):
http://www.bb4wgames.com/misc/2d_asteroids_v1_5.zip
For 20 asteroids, I get the maximum VSync-locked frame rate of 60 fps on my laptop (60 Hz screen refresh rate).
I think the real bottleneck is the BASIC code to calculate the line endpoint coordinates. It's quite involved.
I am aware that the lines disappear (aren't drawn) if one or both endpoints leave the 'viewport'.
David.
|
|
|
|
|