BBC BASIC for Windows
Programming >> Assembly Language Programming >> 256 Thread fast ping sweep
http://bb4w.conforums.com/index.cgi?board=assembler&action=display&num=1427418014

256 Thread fast ping sweep
Post by sveinioslo on Mar 27th, 2015, 01:00am

Wanted to see how fast one could do a single ping sweep, 0.1 second seems to be the fastest on lan.
It's a bit too fast sometimes, but then one can just run it again.

Does someone know the 'exact' requirement for a thread safe mov instruction ?
Is it the instruction itself, its data, or both that must be on dword boundary ?

Also, does someone know if a thread can report being finished, before it actually is ?
I have seen an occasional printout like '192 0 0 0' or '0 0 0 0', without WAITing a bit before reading the ip table result.
Which is pretty hard to explain otherwise.?

Remember to edit The Ip$= to your local ip address, start at zero.

Svein

Code:
      REM 19.okt.2014 Svein Svensson, edit 26.mar.2015
      REM Fast local ping scanner
      REM 8*32=256 threads
      Ip$="192.168.0.0"+CHR$0  : REM local net

      ON ERROR PROCclose : REPORT : END
      ON CLOSE PROCclose : END

      DIM Ping{(255) hping%, ip%, dta&(31), rep&(63), buf&(15), fill&(7)}
      DIM hMain%(7), hThread%(255)
      DIM Code% NOTEND AND 2047, Code% 500, L%-1

      SYS "LoadLibrary", "iphlpapi.dll" TO Iphlpapi%
      SYS "GetProcAddress", Iphlpapi%, "IcmpSendEcho" TO IcmpSendEcho%
      SYS "GetProcAddress", Iphlpapi%, "IcmpCreateFile" TO IcmpCreateFile%
      SYS "GetProcAddress", Iphlpapi%, "IcmpCloseHandle" TO IcmpCloseHandle%
      SYS "LoadLibrary", "Ntdll.dll" TO Ntdll%
      SYS "GetProcAddress", Ntdll%, "RtlIpv4AddressToStringA" TO Rtl2String%
      SYS "GetProcAddress", Ntdll%, "RtlIpv4StringToAddressA" TO RtlString2adr%

      FOR Pass%=8 TO 10 STEP 2
        P% = Code%
        [OPT Pass%
  
        .Ping%
        cld
        mov ebp,[esp+4]
        mov eax,500 : push eax                  ; timeout, can't go faster !    (change to 1000 if on wlan)
        mov eax,64  : push eax                  ; replysize
        mov eax,ebp : add eax,40 : push eax     ; replybuffer rep&(0)
        xor eax,eax : push eax                  ; options
        mov eax,32  : push eax                  ; sendsize, msdn say's word but that doesn't work
        mov eax,ebp : add eax,8  : push eax     ; sendbuffer dta&(0)
        push [ebp+4]                            ; ip%
        push [ebp]                              ; hping%
        call IcmpSendEcho%
        or eax,eax                              ; if result=0 then timeout or some error else we got a reply
        jnz Ping4%
        call "GetLastError"
        mov [ebp+8],eax
        jmps Ping3%
  
        .Ping4%
        mov eax,[ebp+44]                        ; if reply_stat=0 then valid ip else some error
        mov [ebp+12],eax
        or eax,eax
        jz Ping2%
  
        .Ping3%
        xor eax,eax
        ]
        WHILE P%AND3:[OPT Pass%:nop:]:ENDWHILE  : REM dword alignment for mov [ebp+4],eax
        [OPT Pass%
        mov [ebp+4],eax                         ; no reply, clear ip table entry
  
        .Ping2%
        push [ebp] : call IcmpCloseHandle% : ret
  
        .Wait%                                  ; 32 sub wait threads
        mov ebp,[esp+4]
        mov eax,2000 : push eax
        mov eax,1 : push eax
        push ebp
        mov eax,32 : push eax
        call "WaitForMultipleObjects" : ret
        ]
      NEXT Pass%

      REM create ip table
      SYS  RtlString2adr%, !^Ip$, 1, ^C%, ^Ip% TO D%
      IF D%<>0 THEN ERROR 100, "Ip$ convert error"
      FOR I%=0 TO 255
        SYS IcmpCreateFile% TO C%
        Ping{(I%)}.hping%=C%
        Ping{(I%)}.ip%=Ip%
        ?(^Ping{(I%)}.ip%+3)=I%
      NEXT I%

      T%=TIME
      REM create 256 worker threads
      FOR I% = 0 TO 255
        H%=^Ping{(I%)}.hping%
        SYS "CreateThread", 0, 1024, Ping%, H%, 0, 0 TO hThread%(I%)
        IF hThread%(I%) = 0 THEN ERROR 100,"Failed to create Thread."
      NEXT I%

      REM create 8 main wait threads each waiting for 32 sub threads
      FOR I%=0 TO 7
        H%=^hThread%(I%*32)
        SYS "CreateThread", 0, 1024, Wait%, H%, 0, 0 TO hMain%(I%)
        IF hMain%(I%) = 0 THEN ERROR 100,"Failed to create MainThread."
      NEXT I%

      REM wait for the 8 main wait threads
      SYS "WaitForMultipleObjects", 7, ^hMain%(0), 1, 5000
      WAIT 10 : REM wait a bit before scanning ip table, threads not immediately ready ?

      REM print non zero values from ip table
      FOR I%=0 TO 255
        IF Ping{(I%)}.ip%<>0 THEN
          D%=^Ping{(I%)}.ip%
          PRINT STR$(D%?0);" ";STR$(D%?1);" ";STR$(D%?2);" ";STR$(D%?3);
          IF D%!4 THEN PRINT " IcmpErrorCode=";D%!4 ELSE PRINT
          IF D%!8 THEN PRINT " IcmpReplyStat=";D%!8
        ENDIF
      NEXT I%

      PRINT "Scan complete in ";(TIME-T%)/100;" seconds"

      FOR I%=0 TO 7
        SYS "CloseHandle", hMain%(I%)
      NEXT I%
      FOR I%=0 TO 255
        SYS "CloseHandle", hThread%(I%)
      NEXT I%

      PROCclose
      END

      DEF PROCclose
      Ntdll%+=0 : IF Ntdll% SYS "FreeLibrary", Ntdll%
      Iphlpapi%+=0 : IF Iphlpapi% SYS "FreeLibrary", Iphlpapi%
      ENDPROC
 

Re: 256 Thread fast ping sweep
Post by rtr2 on Mar 27th, 2015, 09:36am

on Mar 27th, 2015, 01:00am, sveinioslo wrote:
Does someone know the 'exact' requirement for a thread safe mov instruction ?
Is it the instruction itself, its data, or both that must be on dword boundary ?

It's only data alignment that matters for an atomic read or write.

Quote:
Also, does someone know if a thread can report being finished, before it actually is ?

Extremely unlikely, I would have thought.

Incidentally, there are a few places in your program where you do something like this:

Code:
      mov eax,32 : push eax 

The code would be shorter, and easier to read, if you did:

Code:
      push 32 

Richard.

Re: 256 Thread fast ping sweep
Post by sveinioslo on Mar 27th, 2015, 5:37pm

That is because 'push 32' gives op-code '6A 20' which is 'push imm8' in my manual.
I have not read anywhere if that means only one byte is pushed or if it is padded to dword.
Msdn specifies dword (actually they say word but that doesn't work), so better safe than sorry.

Svein

Re: 256 Thread fast ping sweep
Post by rtr2 on Mar 27th, 2015, 6:46pm

on Mar 27th, 2015, 5:37pm, sveinioslo wrote:
better safe than sorry.

You perhaps forget how much experience I have had of writing x86 assembler code - the entire BBC BASIC for Windows interpreter is implemented that way! I would not have recommended that you use push 32 if there was a risk associated with it; there isn't. The imm8 referred to is the size of the operand (32 fits into a signed 8-bit number); it doesn't relate to the number of bytes pushed onto the stack, which is always 4 (or a multiple thereof).

As I said, using push 32 will make your code shorter and easier to read. If you think there is some sort of risk associated with doing that you should stop using BB4W immediately because there are probably hundreds of such instructions in the code of the interpreter! grin

Richard.

Re: 256 Thread fast ping sweep
Post by sveinioslo on Mar 30th, 2015, 08:00am

Hehe, i used to use 'push imm' but changed it to the 'push reg' because i wasn't sure how many bytes was pushed.
This program wasn't easy to get working, my first multithreaded project, it required a lot of research.
I am making a note on the mov/push instructions, thank you.

Svein

Re: 256 Thread fast ping sweep
Post by Ric on May 28th, 2016, 1:26pm

Svein,

I notice that you use the phrase, "multi-threading", which has caught my eye.
I am currently playing around with 3D graphics using asm and wondered if I could get it to go faster by multi-threading. Unfortunately I have been unable to find satisfactory explanations on the net. Does your code enable two or more sections of code to execute at the same?

The project I am working on is in General Board under 3D gaming project.

Any help would be greatly appreciated.

Ric
Re: 256 Thread fast ping sweep
Post by michael on May 31st, 2016, 11:59am

Apparently, you would need to ask David Williams, as he was apparently behind creating:
GFXLIB library
I am also curious about being able to use ASM to draw super fast to the screen. Its all about stepping stones.
If David were willing to repost the research and the Library for us, maybe we could have some fun.
Its up to you David.
PLEASE?


Re: 256 Thread fast ping sweep
Post by sveinioslo on Jun 3rd, 2016, 07:29am

Ric,

Quote:
Does your code enable two or more sections of code to execute at the same?

Yes, the same piece of code is executed in 256 threads simultaneously.

Quote:
and wondered if I could get it to go faster by multi-threading. Unfortunately I have been unable to find satisfactory explanations on the net.


There's interesting info in the answers here:

http://stackoverflow.com/questions/714905/is-it-possible-to-create-threads-without-system-calls-in-linux-x86-gas-assembly

and some technical stuff here

http://stackoverflow.com/questions/980999/what-does-multicore-assembly-language-look-like

and general info here

https://msdn.microsoft.com/en-us/library/windows/desktop/ms681917%28v=vs.85%29.aspx

Svein

Re: 256 Thread fast ping sweep
Post by michael on Jun 4th, 2016, 01:37am

Here is a product you may want to check out on assembly language and graphics.. From one of the creators of Quake.

http://www.freetechbooks.com/michael-abrash-s-graphics-programming-black-book-t78.html

And apparently you can buy the book on amazon

http://www.amazon.com/exec/obidos/ASIN/1576101746/ref=nosim/freetechbooks-20
Re: 256 Thread fast ping sweep
Post by Ric on Jun 16th, 2016, 07:59am

Thanks guys, I'll look into it.