BBC BASIC for Windows - Unicode in Folder names

BBC BASIC for Windows

Programming

Database and Files (Moderator: admin)

Unicode in Folder names

« Previous Topic | Next Topic »

Pages: 1

Author

Topic: Unicode in Folder names (Read 1336 times)

hellomike
New Member

member is offline

Gender:

Posts: 46

Re: Unicode in Folder names
« Reply #6 on: Apr 2^nd, 2015, 4:22pm »

As before, just getting confirmed how stuff works (and doesn't) helps a great deal!

Quote:

I worry that you are turning something which is intrinsically very simple into something difficult.

Yep, my approach was needlessly difficult. The theory behind it all isn't complex but also not really "very simple" and then again, once understood, everything is simple.

It was confusing for me that the API "FindFirstFileW" wasn't really documented on MSDN and it took me a while to realize that the call works with wide strings for input and output and that a wide string delimiter is now 0x0000.

So there is progress and the following code now lists the folder-names correctly after making a function to make a Wide string version for the initial rootdir (D:\X30Share).
Code:

      CP_UTF8 = &FDE9
      VDU 23,22,640;512;8,16,16,128+8 : REM Select UTF-8 mode
      *font Courier New
      rootpath$="D:\X30Share"

      N%=FNscandir(FNANSItoWide(rootpath$))
      PRINT '"There were ";N%;" files in root path"
      END

      DEF FNscandir(path$)
      LOCAL dir%,sh%,res%,n%,utf8%
      DIM dir% LOCAL 317,utf8% LOCAL 260
      SYS "FindFirstFileW",path$+"\"+CHR$0+"*"+CHR$0+CHR$0+CHR$0,dir% TO sh%
      IF sh%<>-1 THEN
        REPEAT
          IF dir%!44<>&0000002E AND dir%!44<>&002E002E THEN
            IF !dir% AND &10 THEN
              SYS "WideCharToMultiByte",CP_UTF8,0,dir%+44,-1,utf8%,260,0,0
              PRINT $$utf8%
              REM Now I have to somehow append the double byte string at dir%+44
              REM to path$ in order to do the recurse call to this function
              REM path$+=$$(dir%+44) won't work...
            ELSE
              n%+=1
            ENDIF
          ENDIF
          SYS "FindNextFileW",sh%,dir% TO res%
        UNTIL res%=0
        SYS "FindClose",sh%
      ENDIF
      =n%:REM Return number of files in the folder

      REM --------------------------------------------------------------------
      DEF FNANSItoWide(a$)
      LOCAL wide$,i%

      FOR i%=1 TO LENa$
        wide$+=MID$(a$,i%,1)+CHR$0
      NEXT
      =wide$

Also testing for "." and ".." had to change.

I will manage appending the double byte string to path$ but had a strange error.
I gathered that the returned names at dir%+44 now occupy twice as many bytes so I though to enlarge the memory area for dir% to be on the save side and changed only that line
Code:

      DIM dir% LOCAL 511,utf8% LOCAL 260

After listing the folder names, the program errors out with

Not in a function

and emphasized the "=n%" line.

I'm using BB4W V5.95a.

Regards,

Mike

Logged

rtr2
Guest

Re: Unicode in Folder names
« Reply #7 on: Apr 2^nd, 2015, 4:44pm »

on Apr 2^nd, 2015, 4:22pm, hellomike wrote:

I though to enlarge the memory area for dir% to be on the save side and changed only that line
Code:

      DIM dir% LOCAL 511,utf8% LOCAL 260

You were quite right in thinking that it was necessary to increase the amount of memory allocated to dir%, but rather than erring on the "safe side" in fact you didn't increase it enough! If you look at the definition of WIN32_FIND_DATA at MSDN you'll find that the wide-character version occupies 592 bytes so in your program you require as a minimum:

Code:

      DIM dir% LOCAL 591,utf8% LOCAL 260

Strictly speaking the 260 should be increased as well, because the theoretical maximum path length when encoded as UTF-8 is longer than MAX_PATH bytes, but in practice you would be unlikely to exceed that.

Quote:

I'm using BB4W V5.95a.

I would not advise that if you are working with UTF-16 strings. Windows (particularly 64-bit Windows) sometimes requires that such strings are WORD-aligned, i.e. at an even memory address, and BB4W v6.00a guarantees that when using DIM ... LOCAL. However v5.95a does not. Try this program on both v5.95a and v6.00a to see what I mean:

Code:

      FOR N% = 1 TO 10
        PROC1(N%)
      NEXT
      END

      DEF PROC1(S%)
      DIM dir% LOCAL S%
      PRINT dir%
      ENDPROC

Richard.

« Last Edit: Apr 2^nd, 2015, 5:07pm by rtr2 »

Logged

hellomike
New Member

member is offline

Gender:

Posts: 46

Re: Unicode in Folder names
« Reply #8 on: Apr 4^th, 2015, 2:55pm »

Yes I see the difference between v5 and v6 using the code snippet.

I will continue development using BB4W v6.x.

Thanks for all the help and tips.

Mike

Logged

Pages: 1


« Previous Topic \| Next Topic »