Author |
Topic: 4x supersampling (Read 126 times) |
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
4x supersampling
« Thread started on: Oct 21st, 2011, 03:14am » |
|
A first foray...
4x supersampling (bilinear) 100% software implementation 1024x1024 -> 512x512 Non-MMX Currently too slow for realtime game use
http://www.bb4wgames.com/misc/bilinear_4c.zip
The moving filled circles (which are drawn not-antialiased) look alright, but the lines don't.
David.
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: 4x supersampling
« Reply #1 on: Oct 21st, 2011, 09:01am » |
|
on Oct 21st, 2011, 03:14am, David Williams wrote:4x supersampling (bilinear) |
|
I would call that 2x supersampling (in both X and Y directions) myself.
Quote:The moving filled circles (which are drawn not-antialiased) look alright, but the lines don't. |
|
I would expect 4x supersampling (i.e. 16 times the number of pixels) to be necessary for an acceptable degree of antialiasing.
I found an article from somebody who actually understands sampling: http://everything2.com/title/supersampling. In particular he correctly says "Since there is no analog prefilter, no matter how high the supersampling rate, there'll always be some aliasing present".
Richard.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: 4x supersampling
« Reply #2 on: Oct 21st, 2011, 2:02pm » |
|
on Oct 21st, 2011, 09:01am, Richard Russell wrote:I would call that 2x supersampling (in both X and Y directions) myself. |
|
So would I, now!
In any case, software-based supersampling isn't practical for the kind of applications I have in mind.
David.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: 4x supersampling
« Reply #3 on: Oct 21st, 2011, 2:19pm » |
|
on Oct 21st, 2011, 2:02pm, David Williams wrote:In any case, software-based supersampling isn't practical for the kind of applications I have in mind. |
|
What's the main limitation, the downsampling to the final size, or the overhead of having to plot perhaps 16-times the number of pixels you otherwise would?
Richard.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: 4x supersampling
« Reply #4 on: Oct 21st, 2011, 3:10pm » |
|
on Oct 21st, 2011, 2:19pm, Richard Russell wrote:What's the main limitation, the downsampling to the final size, or the overhead of having to plot perhaps 16-times the number of pixels you otherwise would?
Richard. |
|
If I wanted, say, a game to run at an acceptable frame rate (30 fps minimum), the CPU bandwidth requirements would be substantial; clearing (or redrawing the background of) the (very large) source buffer each frame, drawing extra-large bitmaps/sprites to it which themselves may involve various CPU-burdernsome operations like blending and various other colour manipulations. Then doing the actual "supersampling". There's also got to be some bandwidth left for the usual realtime calculations dependent on the particular game in question. 30 fps might possibly be achievable, but the dear old CPU will be running flat-out.
In this context, I don't think a software implementation of 4x supersampling (or higher) is practical, unless we fancy our CPUs red-hot.
David.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: 4x supersampling
« Reply #5 on: Oct 21st, 2011, 4:56pm » |
|
on Oct 21st, 2011, 3:10pm, David Williams wrote:30 fps might possibly be achievable, but the dear old CPU will be running flat-out. |
|
Assuming we're comparing like for like (i.e. achieving anti-aliasing by means of supersampling versus directly plotting anti-aliased objects) it seems to me that it's not so clear cut.
As we've both found, plotting anti-aliased lines and curves is non-trivial, especially if there is a thickness parameter. And plotting anti-aliased filled shapes like Bezier curves is particularly challenging. The page to which I linked before stated that antialiased ray-tracing is all but impossible.
So what we end up with is a situation where if there aren't too many objects, and if they're not too complex, then directly plotting them antialiased will most likely win. But if there are a large number of objects, and/or they are quite complex, then plotting them supersampled, but not antialiased, might be the best solution.
As ever, it would require experimentation to discover what is the best approach for any given situation.
Richard.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: 4x supersampling
« Reply #7 on: Oct 22nd, 2011, 9:13pm » |
|
on Oct 22nd, 2011, 7:49pm, David Williams wrote:I'd say that 2x may be just about sufficient for 2D bitmap-based games. |
|
When you did the previous test it was the straight lines which seemed to be the most critical - probably because the aliasing is periodic and therefore particularly noticeable. Might it be worth incorporating some filled polygons as well as the circles?
Richard.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: 4x supersampling
« Reply #8 on: Oct 23rd, 2011, 4:05pm » |
|
on Oct 22nd, 2011, 9:13pm, Richard Russell wrote:When you did the previous test it was the straight lines which seemed to be the most critical - probably because the aliasing is periodic and therefore particularly noticeable. Might it be worth incorporating some filled polygons as well as the circles?
Richard. |
|
I've tried some rotated bitmaps, and 4x supersampling takes reasonably good care of the stray pixels stemming from rounding errors in the largely fixed-point calculations.
My attempt early this morning at translating my BASIC implementation of a generalised supersampler (1x, 2x, 4x, ... etc.) to assembly language resulted in nothing but crashes after an hour or so of bug-hunting. How annoying. I'm starting to hate assembly language.
EDIT: Perseverance often pays off. This time it did, because I've got the code working. Will upload it later for those very few who may be interested.
David.
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: 4x supersampling
« Reply #9 on: Oct 23rd, 2011, 6:47pm » |
|
For those it might interest, here's a demo program which draws filled circles on a large bitmap, which is then resized to the dimensions chosen by you. The size of this large 'internal' bitmap is dependent on the dimensions you choose, and the chosen sample size.
http://www.bb4wgames.com/misc/xn_supersample_asm.zip
Please first try these parameters:
Bitmap width: 640 Bitmap height: 480 Sample size: 8
If your choice of parameters results in memory requirements exceeding 255 MB, then you'll be told to revise your parameters.
If you want to see what the circles look like without supersampling, then enter 1 as the sample size.
I think a fast (optimised), and necessarily MMX-based 2x2 (or even 4x4) supersampler would make a very handy addition to GFXLIB. So I suppose that's next on the agenda. :)
Obviously, this generalised (and in any case totally unoptimised) supersampler is no good for realtime situations such as a game would demand.
David.
The assembler code (very rough around the edges):
Code: DEF PROC_asm
LOCAL C%, I%, P%
DIM C% 511
FOR I% = 0 TO 2 STEP 2
P% = C%
[OPT I%
; // ALIGN
] : P% = (P% + 31) AND -32 : [OPT I%
.supersample%
; Parameters:
;
; pBm%, pBm2%, bmW%, bmH%, smpSz%
;
pushad
mov ebp, esp
sub esp, 128
; -----------------------------------------------------------------------------------------------
; EBP!36 = pBm%
; EBP!40 = pBm2%
; EBP!44 = bmW%
; EBP!48 = bmH%
; EBP!52 = smpSz%
; -----------------------------------------------------------------------------------------------
;
; calc. smpSzSq% = smpSz%^2
;
mov eax, [ebp + 52]
imul eax, eax
mov [esp + 0], eax ; ESP!0 = smpSz%^2 = smpSzSq%
;
; Set FPU rounding mode to ''Truncate''
;
finit
xor eax, eax
mov DWORD [esp + 8], &00000000
fstcw [esp + 8]
mov ax, [esp + 8]
and ax, &F3FF
or ax, &C00
mov [esp + 8], ax
fldcw [esp + 8]
;
; calc. S% = 65536 * (1.0 / smpSzSq%)
;
push 65536
fild DWORD [esp] ; st0 = 65536
fld1 ; st0 = 1.0, st1 = 65536
fidiv DWORD [esp + (0 +4)] ; st0 = 1.0 / smpSzSq%, st1 = 65536
fmul ; st0 = 65536 * (1.0 / smpSzSq%)
fistp DWORD [esp + (4 +4)] ;
add esp, 4 ; ESP!4 = S%
;
; Calc. bm2W% and bm2H%
;
; Where bm2W% = smpSz% * bmW%
; bm2H% = smpSz% * bmH%
;
;
mov eax, [ebp + 44] ; bmW%
mov ebx, [ebp + 48] ; bmH%
imul eax, [ebp + 52] ; bmW% * smpSz% = bm2W%
imul ebx, [ebp + 52] ; bmH% * smpSz% = bm2H%
mov [esp + 12], eax ; ESP!12 = bm2W%
mov [esp + 16], ebx ; ESP!16 = bm2H%
;
; Calc. rowBytesLen% = 4 * bm2W%
;
shl eax, 2 ; = 4 * bm2W%
mov [esp + 20], eax ; ESP!20 = rowBytesLen%
;
; Calc. bmW%-1 and bmH%-1
;
mov eax, [ebp + 44] ; bmW%
mov ebx, [ebp + 48] ; bmH%
sub eax, 1 ; bmW% - 1
sub ebx, 1 ; bmH% - 1
mov [esp + 24], eax ; ESP!24 = bmW%-1
mov [esp + 28], ebx ; ESP!28 = bmH%-1
;
; Calc. rowBytesLen%*smpSz%
;
mov eax, [esp + 20] ; rowBytesLen%
imul eax, [ebp + 52] ; rowBytesLen% * smpSz%
mov [esp + 32], eax ; ESP!32 = rowBytesLen% * smpSz%
;
; Calc. 4*smpSz%
;
mov eax, [ebp + 52] ; smpSz%
shl eax, 2 ; 4*smpSz%
mov [esp + 36], eax ; ESP!36 = 4*smpSz%
;
; So far, we have:
;
; ESP!0 = smpSzSq%
; ESP!4 = S% (= 65536 * 1.0/SmpSzSq%)
; ESP!8 = ...
; ESP!12 = bm2W%
; ESP!16 = bm2H%
; ESP!20 = rowBytesLen%
; ESP!24 = bmW% - 1
; ESP!28 = bmH% - 1
; ESP!32 = rowBytesLen% * smpSz%
; ESP!36 = 4 * smpSz%
;
mov eax, [ebp + 36] ; EAX = pBm
mov ebx, [ebp + 40] ; EBX = pBm2
mov DWORD [esp + 44], 0 ; ESP!44 = Y-loop control variable (Y%)
.supersample_yLoop% ; Y-loop control var goes from 0 to bmH%-1
mov DWORD [esp + 40], 0 ; ESP!40 = X-loop control variable (X%)
.supersample_xLoop% ; X-loop control var goes from 0 to bmW%-1
;
; Calc. O% = (rowBytesLen% * smpSz% * Y%) + (4 * smpSz% * X%)
;
mov edi, [esp + 32] ; EDI = rowBytesLen% * smpSz%
mov esi, [esp + 36] ; ESI = 4 * smpSz%
imul edi, [esp + 44] ; EDI = rowBytesLen% * smpSz% * Y%
imul esi, [esp + 40] ; ESI = 4 * smpSz% * X%
add edi, esi ; EDI = (rowBytesLen% * smpSz% * Y%) + (4 * smpSz% * X%) = O%
;
; ESP!48 = red sum (rSum%)
; ESP!52 = green sum (gSum%)
; ESP!56 = blue sum (bSum% )
;
mov DWORD [esp + 48], 0 ; init. rSum% = 0
mov DWORD [esp + 52], 0 ; init. gSum% = 0
mov DWORD [esp + 56], 0 ; init. bSum% = 0
xor esi, esi ; inner Y-loop counter (y%) (goes from 0 to smpSz%-1)
.supersample_innerYloop%
push esi ; preserve ESI (inner Y-loop counter)
xor esi, esi ; inner X-loop counter (x%) (goes from 0 to smpSz%-1)
.supersample_innerXloop%
; EAX = pBm
; EBX = pBm2
; EDI = O%
; ESI = x%
;
; Calc. O2% = O% + 4*x%
;
push esi ; preserve ESI (x%)
shl esi, 2 ; 4 * x%
add esi, edi ; O% + 4*x% = O2%
movzx ecx, BYTE [ebx + esi + 0] ; load blue byte (blueVal)
add [esp + (56 +8)], ecx ; bSum% += blueVal
movzx ecx, BYTE [ebx + esi + 1] ; load green byte (greenVal)
add [esp + (52 +8)], ecx ; gSum% += greenVal
movzx ecx, BYTE [ebx + esi + 2] ; load red byte (redVal)
add [esp + (48 +8)], ecx ; rSum% += redVal
pop esi
add esi, 1 ; x% += 1
cmp esi, [ebp + 52] ; x% > smpSz%-1 ?
jl supersample_innerXloop%
add edi, [esp + (20 +4)] ; O% += rowBytesLen%
pop esi ; ESI = y%
add esi, 1 ; y% += 1
cmp esi, [ebp + 52] ; y% > smpSz%-1 ?
jl supersample_innerYloop%
;
; Calc. 4*(Y%*bmW% + X%)
;
mov esi, [esp + 44] ; Y%
imul esi, [ebp + 44] ; Y%*bmW%
add esi, [esp + 40] ; Y%*bmW% + X%
shl esi, 2 ; 4*(bmW% + X%)
;
; Write averaged red, green, blue values to bm1
;
mov edx, [esp + 4] ; EDX = S% (= 65536 * 1.0/SmpSzSq%)
mov ecx, [esp + 48] ; ECX = rSum%
imul ecx, edx
shr ecx, 16
mov [eax + esi + 2], cl
mov ecx, [esp + 52] ; ECX = gSum%
imul ecx, edx
shr ecx, 16
mov [eax + esi + 1], cl
mov ecx, [esp + 56] ; ECX = bSum%
imul ecx, edx
shr ecx, 16
mov [eax + esi + 0], cl
mov edx, [ebp + 44] ; EDX = bmW%
add DWORD [esp + 40], 1 ; X += 1
cmp DWORD [esp + 40], edx ; X < bmW% ?
jl near supersample_xLoop%
mov edx, [ebp + 48] ; EDX = bmH%
add DWORD [esp + 44], 1 ; Y += 1
cmp DWORD [esp + 44], edx ; Y < bmH% ?
jl near supersample_yLoop%
add esp, 128
popad
ret 20
]
NEXT I%
ENDPROC
|
|
|
|
|