Author |
Topic: UTF-8 editor (Read 1154 times) |
|
Ken Down
Guest
|
 |
UTF-8 editor
« Thread started on: Jun 5th, 2010, 04:34am » |
|
Richard Russell's simple text editor is very useful, but I cannot get it to accept or output UTF-8 text. I've put in the recommended code to tell it to use UTF-8, but any "funny" characters just come out on screen as ||||| and are saved likewise.
Any ideas, please?
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: UTF-8 editor
« Reply #1 on: Jun 5th, 2010, 09:20am » |
|
on Jun 5th, 2010, 04:34am, Guest-Ken Down wrote:Richard Russell's simple text editor is very useful, but I cannot get it to accept or output UTF-8 text. |
|
Are we talking about TEXTEDIT.BBC (or something else based on a Windows Edit Control)? If so, Windows Edit Controls do not directly support UTF-8 encoding. What you will have to do is to configure the Edit Control as Unicode (UTF-16, or more precisely UCS-2 encoding) then use MultiByteToWideChar and WideCharToMultiByte respectively to convert UTF-8 to UCS-2 and UCS-2 to UTF-8. It's not too difficult.
To convert the Edit Control to Unicode see this Wiki article, but use the "RichEdit20W" class rather than "RichEdit20A":
http://bb4w.wikispaces.com/Using+Rich+Edit+controls
Do be careful when allocating buffers to ensure they are big enough (UCS-2 requires two bytes per character).
Richard.
|
|
Logged
|
|
|
|
Ken Down
Guest
|
 |
Re: UTF-8 editor
« Reply #2 on: Jun 6th, 2010, 4:10pm » |
|
It's the TEXTEDIT.BBC example program, not an edit box. I presume, therefore, that the detailed instructions you have given do not apply.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: UTF-8 editor
« Reply #3 on: Jun 6th, 2010, 4:54pm » |
|
on Jun 6th, 2010, 4:10pm, Guest-Ken Down wrote:It's the TEXTEDIT.BBC example program, not an edit box. I presume, therefore, that the detailed instructions you have given do not apply. |
|
TEXTEDIT.BBC does use a Windows Edit Control (an "edit box", if you prefer):
Code:Hedit% = FN_createwindow("EDIT", "", 0, 0, @vdu%!208, @vdu%!212, 0, &200044, 0) Therefore the instructions I gave apply in full.
Richard.
|
|
Logged
|
|
|
|
Ken Down
Guest
|
 |
Re: UTF-8 editor
« Reply #4 on: Jun 6th, 2010, 8:23pm » |
|
Oh, OK. I didn't realise that a window was an edit box. I'll try working it out. Thanks.
|
|
Logged
|
|
|
|
Ken Down
Guest
|
 |
Re: UTF-8 editor
« Reply #5 on: Jun 27th, 2010, 07:40am » |
|
Hmmmm. I finally got around to trying this out. I loaded in the example program, TEXTEDIT.BBC I copied the section from the instructions for Rich Text Edit Boxes which begins SYS "LoadLibrary", "RICHED20.DLL" and ends SCF_ALL = 4 and put them just before the call to FN_createwindow.
I then altered the call to FN_createwindow Hedit% = FN_createwindow("RichEdit20W","",0,0,@vdu%!208,@vdu%!212,0,WS_BORDER,0)
When I ran the program the edit box appeared with a nice border around it. It accepted and displayed correctly some Hebrew characters, but it would not accept RETURN, just beeped when I pressed that key.
I returned the style parameter to &200044 and the border disappeared but it would now accept RETURN.
However when I saved what I had entered the Hebrew characters appeared as ? (which isn't much of an improvement).
I have tried putting VDU23,22,800;600;8,16,16,8+128 (which is supposed to set the font to UTF-8) right at the start of the program, but it makes no difference.
I presume there is something simple which I have overlooked, but I can't see what it might be. Any help gratefully accepted.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: UTF-8 editor
« Reply #6 on: Jun 27th, 2010, 09:19am » |
|
on Jun 27th, 2010, 07:40am, Guest-Ken Down wrote:it would not accept RETURN, just beeped when I pressed that key. |
|
Check your style values. You probably need to include ES_MULTILINE (4) and possibly ES_WANTRETURN (&1000). See the list of RichEdit styles here:
http://msdn.microsoft.com/en-us/library/bb774367.aspx
Quote:However when I saved what I had entered the Hebrew characters appeared as ???? (which isn't much of an improvement). |
|
It's a little hard to comment without seeing your code. I explained before that you would need to use SYS "WideCharToMultiByte" to convert the UCS-2 text returned from the RichEdit control to the UTF-8 text that you want to save to file. You must also use SYS "SendMessageW" (rather than the regular SYS "SendMessage") to get the UCS-2 data in the first place. My guess would be that you've used one or other of those calls incorrectly.
This is what I would expect your code to look like (or something very similar):
Code: DEF FNsaveas : LOCAL F%, L%, N%, U%
SYS "GetSaveFileName", fs{} TO F%
IF F% PROCtitle ELSE = FALSE
DEF FNsave : LOCAL F%, L%, N%, U% : IF ?Fn% = 0 THEN = FNsaveas
SYS "SendMessageW", Hedit%, WM_GETTEXTLENGTH, 0, 0 TO L%
SYS "GlobalAlloc", 0, 2*(L%+1) TO F%
SYS "SendMessageW", Hedit%, WM_GETTEXT, L%+1, F%
SYS "WideCharToMultiByte", CP_UTF8, 0, F%, L%, 0, 0, 0, 0 TO N%
SYS "GlobalAlloc", 0, N% TO U%
SYS "WideCharToMultiByte", CP_UTF8, 0, F%, L%, U%, N%, 0, 0
SYS "GlobalFree", F%
OSCLI "SAVE """+$$Fn%+""" "+STR$~U%+"+"+STR$~N%
SYS "GlobalFree", U%
= TRUE Quote:I have tried putting VDU23,22,800;600;8,16,16,8+128 (which is supposed to set the font to UTF-8) right at the start of the program, but it makes no difference. |
|
As I've explained before, TEXTEDIT.BBC does not use BBC BASIC's VDU emulator for its output, therefore that command is irrelevant and will have no effect.
Richard.
|
« Last Edit: Jun 27th, 2010, 09:46am by admin » |
Logged
|
|
|
|
Ken Down
Guest
|
 |
Re: UTF-8 editor
« Reply #7 on: Jun 28th, 2010, 9:18pm » |
|
Ok, I'll play around with that over the next few days. Thanks for your patience and expertise.
|
|
Logged
|
|
|
|
Ken Down
Guest
|
 |
Re: UTF-8 editor
« Reply #8 on: Jul 5th, 2010, 5:05pm » |
|
I presume that when loading a file back in again, I would need to use the opposite call, "MultiByteToWideChar"? Would the parameters be the same as in the two calls to "WideCharToMultiByte" in the save routine?
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: UTF-8 editor
« Reply #9 on: Jul 5th, 2010, 6:06pm » |
|
on Jul 5th, 2010, 5:05pm, Guest-Ken Down wrote:I presume that when loading a file back in again, I would need to use the opposite call, "MultiByteToWideChar"? |
|
That's correct.
Quote:Would the parameters be the same as in the two calls to "WideCharToMultiByte" in the save routine? |
|
MultiByteToWideChar has fewer parameters (six rather than eight). Look it up in your preferred Windows API Reference. APIViewer will give you the declaration in BBC BASIC syntax, but not tell you what the parameters mean (I don't advise guessing)!
See Frequently Asked Question #8:
http://www.bbcbasic.co.uk/bbcwin/faq.html#q8
Richard.
|
|
Logged
|
|
|
|
|