BBC BASIC for Windows
General >> General Board >> New version of FIRBBC released
http://bb4w.conforums.com/index.cgi?board=general&action=display&num=1437635164

New version of FIRBBC released
Post by rtr2 on Jul 23rd, 2015, 07:06am

I've released a new version of my filter synthesis tool FIRBBC:

https://sourceforge.net/projects/firbbc/

The BBC BASIC source code is available for download from Sourceforge.

Richard.
Re: New version of FIRBBC released
Post by VBI on Jul 24th, 2015, 09:37am

Richard,

Thanks for flagging up the release of a new version of this interesting program.

I have dabbled with it on several occasions since its first release on Sourceforge. My primary interest for doing so is in regard to video sample rate conversion, usually upwards.

What I would like to be able to do is apply the designed filter data to a FIR filter implementation in BB4W but, sadly, my assembler skills are nowhere near up to it, although I do still live in hope that the lightbulb moment will happen one day.

Have you ever considered publishing a basic Wiki example that could be built on, or even just a suggested 'best way to do it' framework, as a way of assisting us mortals to get a foot in the door to more advanced DSP using BB4W?


Graham.
Re: New version of FIRBBC released
Post by rtr2 on Jul 24th, 2015, 10:23am

on Jul 24th, 2015, 09:37am, VBI wrote:
Have you ever considered publishing a basic Wiki example that could be built on, or even just a suggested 'best way to do it' framework

It would be difficult to give 'generic' guidance on writing an assembler FIR filter, because the optimum code architecture depends too much on issues such as the number of taps, coefficient size, data word length and format etc.

But, in case it's helpful, here's an MMX implementation of a 31 or 32-tap horizontal FIR operating on one 'column' of a 2D array of 8-bit signed data (e.g. a monochrome image); it's taken from my Colour Recovery application:

Code:
        ; Horizontal filtering, 31-tap FIR:
        ;
        ; A% = address of 16-bit coefficient data (signed)
        ; B% = address of 8-bit source data (signed)
        ; C% = address step (width in pixels)
        ; D% = address of destination data (signed)
        ;
        .hfilter
        mov ebp,eax            ; coefficients
        mov esi,ebx            ; source
        mov edi,edx            ; dest
        mov edx,ecx            ; step (bytes) n.b. AFTER mov edi,edx
        mov ebx,OutputWidth%
        mov ecx,OutputHeight%
        mov eax,esp
        and esp,-8             ; align 8 (AMD Athlon)
        push eax               ; save original stack pointer
        sub esp,12             ; make space, maintaining alignment
        .hsloop
        punpcklbw mm0,[esi]    ; fetch and unpack video data
        punpcklbw mm1,[esi+4]
        punpcklbw mm2,[esi+8]
        punpcklbw mm3,[esi+12]
        punpcklbw mm4,[esi+16]
        punpcklbw mm5,[esi+20]
        punpcklbw mm6,[esi+24]
        punpcklbw mm7,[esi+28]
  
        psraw mm0,8            ; align and sign-extend
        psraw mm1,8
        psraw mm2,8
        psraw mm3,8
        psraw mm4,8
        psraw mm5,8
        psraw mm6,8
        psraw mm7,8
  
        pmaddwd mm0,[ebp]      ; multiply by coefficients and add
        pmaddwd mm1,[ebp+8]
        pmaddwd mm2,[ebp+16]
        pmaddwd mm3,[ebp+24]
        pmaddwd mm4,[ebp+32]
        pmaddwd mm5,[ebp+40]
        pmaddwd mm6,[ebp+48]
        pmaddwd mm7,[ebp+56]
  
        paddd mm0,mm1          ; accumulate 0-7
        paddd mm2,mm3          ; accumulate 8-15
        paddd mm4,mm5          ; accumulate 16-23
        paddd mm6,mm7          ; accumulate 24-31
        paddd mm0,mm2          ; accumulate 0-15
        paddd mm4,mm6          ; accumulate 16-31
        paddd mm0,mm4          ; accumulate 0-31
  
        movq [esp],mm0         ; store partial sums (esp aligned)
        mov eax,[esp]
        add eax,[esp+4]        ; accumulate
        sar eax,15             ; divide-by-32768
        adc eax,0              ; rounding
        call sclip
        mov [edi],al           ; result
  
        lea esi,[esi+edx]      ; skip to next line
        lea edi,[edi+ebx]
        dec ecx
        jnz near hsloop
        add esp,12
        pop esp                ; restore original stack pointer
        emms
        ret
 

The 'core' filtering code is, I hope, easily understood and could readily be adapted for fewer taps (although not more, unless changed to use SSE2 instructions rather than MMX). The clipping routine sclip is not listed (in some circumstances it may not be needed).

Richard.
Re: New version of FIRBBC released
Post by VBI on Jul 24th, 2015, 2:28pm

on Jul 24th, 2015, 10:23am, g4bau wrote:
It would be difficult to give 'generic' guidance on writing an assembler FIR filter

I do appreciate the difficulty with such a task, given the breadth of the subject.

on Jul 24th, 2015, 10:23am, g4bau wrote:
But, in case it's helpful, here's an MMX implementation of a 31 or 32-tap horizontal FIR

That is immensely helpful and much appreciated. I can see the principle operation of the filter core, even if I will have to take a look at the IA32 reference for some of those ‘p’ instructions acting on the mm registers. Thank you.

Graham.