Chapter 6 - String Data Types
A string is a list of text characters. We tell BASIC that we are
dealing with text rather than variable names by enclosing the text in
double quotation marks. Applying this, you should be able to see why,
if we want to write 'Hello' on the screen we use:
and not
The second example would send BASIC scurrying off to its variable list
trying to find one called Hello.
If it just so happened that you had
one, it will print its value, most likely you won't so BASIC will
complain.
We can think of the way a string variable holds its value as a series
of memory locations, each of which holds a character, like this:
The quotes are not kept as part of the text, they are just used as
delimiters during programming. Each position is one byte in size, this
means it can hold a number in the range of 0-255. So, if a byte can
only hold a number, how does it store letters like above? The answer is
that the operating system has a lookup table which it uses to translate
your text into numeric codes for storing in memory and then translate
them back again when we want to print them out. The table is called the
ASCII table (American Standard Code for Information Interchange, programmers
love acronyms) and it provides a table of corresponding letters,
numbers, punctuation marks and other assorted characters. When we store
the letters for "Hello", it actually represents them internally like
this:
You can see the full table in the online help under Reference
Information. Note that not all the codes have a visible representation
and codes less than 32 may cause strange things to happen if you try to
print them. These lower codes are often referred to as control
characters, they represent things like horizontal tab, form feed etc.
Also, the numbers above 127 are a non-standard standard (!) and so will
give different characters depending on the font that is being used.
Look at character 32, space. Space is normally filtered out by our
brains when we read text, it's there but we ignore it. To a computer,
space still needs a representation and so is given a value, just like
any of the other punctuation marks such as comma (code 44) or decimal
point / full stop (code 46). As BASIC is so picky about spaces, this
means that the two strings:
"Hello"
and "
Hello"
would be considered different, as stated we tend to filter it out,
but to the computer it's just another character code.
As numbers have limits, so too do strings. The limits are
the code of the character (0 to 255) and the number of characters the
string variable can hold, or the length of the string. In BBC BASIC the
length can range between 0 and 65535 characters. Zero because you can
have a string with nothing in it. In fact when you first declare a
string variable, BASIC creates it with zero length, i.e. with nothing
in it. This may seem a little odd but is a useful concept. If at any
time you wish to set a string to hold nothing, this is how you do it:
That's double quotes with no gap between them. As a space is a
character, this is not the same as:
If you were to print them out, you would not see any difference, but
that, of course, doesn't mean they are the same. A string with nothing
in it is variously called an empty string or null string. You'll come
across both.
Now we've got to grips with what a string is, what can we do with them?
With numeric variables, you can add, subtract, square root etc. Strings
are a little more limited. You can only add them:
REM
Adding strings |
S1$
= "Hello" |
S2$
= ", " |
S3$
= "world" |
S4$
= S1$ + S2$ |
PRINT S4$ |
S4$
= S4$ + S3$ |
PRINT S4$ |
END |
This is called concatenation, which is a fancy word meaning chain them
together. The program copies the contents of S1$, splices S2$ onto the
end of it and puts the result into S4$.
Line 7 adds S3$
onto the end of
S4$. There
would be nothing to stop you doing all this in one line.
REM
Adding strings |
S1$
= "Hello" |
S2$
= ", " |
S3$
= "world" |
S4$
= S1$ + S2$ + S3$ |
PRINT S4$ |
END |
This is the only mathematical operation that is allowed on strings.
None of the others make much sense anyway: how do you find the square
root of "Hello"? Don't for one moment think that's it, though, BASIC
has a very comprehensive set of functions for manipulating string
variables. These are dealt with in the following section.
String Functions
LEN
One of the most useful things we can know about a string is its length.
The function LEN tells us exactly this. It must always have one
argument, though as is usual, this can be an expression. The result
must always be assigned to a numeric value or used in an expression
where a numeric value is expected. In immediate mode, try the following:
PRINT
LEN("Hello") |
PRINT
LEN("Hello, world") |
LEN can be used to distinguish between empty strings and strings with
no visible characters:
REM LEN of an empty string |
A$="" |
B$=" " |
PRINT LEN(A$) |
PRINT LEN(B$) |
END
|
STRING$
There are times when you want to be able to generate a repeating
pattern of text without typing it all in manually. STRING$ does just
this. It takes as its parameters a number of repetitions and a base
string. It returns a string which is the base string repeated the given
number of times:
PRINT
STRING$(3,"+++===") |
Here is a little program that will take a string then underline it.
REM
Underline using LEN and STRING$ |
Title$ = "BBC BASIC" |
L%=LEN(Title$) |
PRINT Title$ |
PRINT STRING$(L%,"*") |
END |
Or, just to get carried away, we could put the title in a box:
REM Box
using LEN and STRING$ |
Title$ = "BBC BASIC" |
L%=LEN(Title$) |
PRINT STRING$(L%+4,"*") |
PRINT "* ";Title$;" *" |
PRINT STRING$(L%+4,"*") |
END |
INSTR
Although it's easy to create strings, there are times when we want to
inspect their contents. The function INSTR allows us to search a string
for a character or pattern of characters. INSTR takes two or three
arguments. The first is the string we wish to search. The second is a
string containing the characters we wish to search for. The third is
optional, we'll get to it in a minute. When supplied with two
parameters, INSTR will return the position of the first character in
the search string that matches the characters in the list to
search for. This example will return the position of the first letter C
in the target string:
PRINT
INSTR("BBC BASIC", "C") |
The first character in a string is position 1. If INSTR returns 0, it
means no match was found.
PRINT
INSTR("BBC BASIC", "C") |
The optional third parameter can force INSTR to start at a position
other than 1. This means we can search the entire string by remembering
the last position returned and starting one character after that.
REM INSTR Demo |
Posn%=INSTR("BBC BASIC","C") |
PRINT "C found in position: ";Posn% |
Posn%=Posn%+1 |
Posn%=INSTR("BBC BASIC","C",Posn%) |
PRINT "C found in position: ";Posn% |
END
|
Notice how we have to increment Posn%
to get it past the first C. If
we hadn't, we would have started from position 3 again. As position 3
is a C, the search would have returned the same value again.
If the
start position is larger than the length of the string, you get 0 (not
found) in return.
INSTR can also search for a sequence of characters in the target string.
PRINT
INSTR("BBC BASIC","BBC") |
The thing to be wary of here is how you specify the string to search for
PRINT INSTR("BBC BASIC FOR
WINDOWS","FOR") |
PRINT INSTR("FORTUNE
FAVOURS THE
BOLD","FOR") |
Will both tell you that both contain the word "FOR", when clearly the
second one doesn't. This again is because BASIC has no concept of
language, it just looks for a pattern of characters and when it finds a
match, stops. A more correct way would be to search for:
PRINT INSTR("BBC BASIC FOR
WINDOWS","FOR
") |
PRINT INSTR("FORTUNE
FAVOURS THE
BOLD","FOR ") |
LEFT$ and RIGHT$
The next two functions return a subsection of a string and are dealt
with together as they are functionally similar.
LEFT$ takes two parameters: a target string and a number of characters.
It returns a string which is the number of characters in length
starting from position 1.
PRINT
LEFT$("Hello, world", 5) |
If the number is greater than the total length of the string, you just
get the whole string.
PRINT
LEFT$("Hello, world", 100) |
LEFT$ will also accept one parameter only:
This will return all the characters but the last one and is the same as:
PRINT LEFT$("Hello", LEN("Hello")-1) |
It is also possible to use LEFT$ as an assignment. In this mode, LEFT$
will overwrite the characters in the string with the ones being
assigned, starting at the first character.
REM LEFT$
as an assignment |
MyStr$="Hello, world" |
LEFT$(MyStr$,6)="Byebye" |
PRINT MyStr$ |
END |
If you specify a number less
than the length of the replacement, BASIC will only overwrite the
number of characters specified. Should you specify more, BASIC will
only overwrite up to the maximum characters in the replacement string.
RIGHT$ takes the same arguments as LEFT$ but returns the rightmost
number of characters.
PRINT
RIGHT$("Hello, world", 5) |
Again, if the number is too big, you just get the whole string back.
With only one argument, RIGHT$ will return just the last character.
Predictably, when used in an assignment, RIGHT$ will overwrite the
characters at the end of the string.
REM RIGHT$
as an assignment |
MyStr$="Hello, world" |
RIGHT$(MyStr$,5)="mummy" |
PRINT MyStr$ |
END |
Exactly what happens if you specify fewer characters than the length of
the replacement string is probably best illustrated by example. Change
the 5 to 4 in line 3 above and see what happens. It starts 4 characters
from the end of the string and copies the first 4 characters from the
replacement string. If you tell BASIC to use more characters than are
contained in the string, our friendly computer will effectively derive
its own number. Substitute 8 in line 3 and see. The replacement doesn't
start 8 characters away from the end of the string, it merely works out
that the replacement has 5 characters, and starts at that position
instead.
Please note that with both LEFT$ and RIGHT$, you cannot lengthen the
original string by giving more characters in the replacement than are
in the target. BASIC will just truncate the substitute string at the
length of the
target.
MID$
LEFT$ and RIGHT$ allow us to manipulate the start and end of a string,
but what happens if you want to extract from the middle? MID$ will do
this for us.
In its more common application, MID$ has three parameters: a string,
the start position and a number of characters. As with all strings, the
left most character is position 1. Try this:
PRINT
MID$("Fortune favours the bold",
9, 7) |
This returns 7 characters starting at position 9 i.e. "favours" in this
case.
If the last number is bigger than the length of the string, you
just get everything up to the end.
PRINT
MID$("Fortune favours the bold",
9, 1000) |
This case is so common that BASIC allows us to omit the final
parameter. If you do this, BASIC assumes that you want all the
characters from the start position to the end.
PRINT
MID$("Fortune favours the bold", 9) |
OK, that was painless enough, but we're not finished. Like RIGHT$ and
LEFT$, MID$ can also be used on the other side of the equals sign. This
means that you can get BASIC to replace a section of a string:
REM MID$
demo |
A$ = "Give me patience!!" |
MID$(A$,9,8) =
"strength" |
PRINT A$ |
END |
From the above description, you should be able to guess what it's
doing. For completeness: line 3 takes the string "strength" which is 8
characters long and, starting at position 9 in A$, replaces the
characters one for one with the characters in "strength".
There are several things to be aware of when dealing with the number of
characters. Usually, the number is the same as the length
of the replacement string. If the number of characters specified is
shorter than the length of the replacement, only that number
of characters are copied:
MID$(A$,9,4) =
"strength" |
Also, if the start position in the target string plus the number of
characters is greater than the total length of the replacement string,
BASIC will only copy characters up to the end of the target string and
ignore anything after:
MID$(A$,9,13)
= "all your cash" |
To put it another way, BASIC will not extend the length of the target
string.
You can leave out the number of characters. In this case BASIC assumes
the length of the replacement string, but still obeys the rules given
above.
Now for a little demo that uses INSTR, LEFT$ and MID$. Suppose we
have someone's full name and we want to separate it into first name and
surname. We know that the two names are separated by a space, so first
we use INSTR to locate the space. Then we copy all the characters up
to, but not including, the space into the string that keeps the first
name. Next we take all the letters starting after the space up to the
end and save them in the surname. Have a crack at this yourself first
before looking at my result if you want to, it's the only way to learn.
REM Separate names |
FullName$ = "Joe
Soap" |
Posn% = INSTR(FullName$,
" ") |
FirstName$ = LEFT$(FullName$,
Posn%-1) |
Surname$ = MID$(FullName$,
Posn%+1) |
PRINT "Your first name is: ";FirstName$ |
PRINT "Your surname is: ";Surname$ |
END
|
How did you do? There are always as many ways to code the solution to a
program as there are people trying to code it, so if you got a
different solution that's fine. Also don't be upset if you didn't get
it completely right first go, I didn't: it's all part of the
programming process.
ASC and CHR$
We have already made the acquaintance of the ASCII table. It is very
useful to be able to find the codes that correspond to the letters and
vice versa. That's the job of ASC and CHR$.
ASC returns an integer which is the ASCII code for the character passed
as a parameter:
Gives 65, as expected.
Note also:
Gives 49, which is the code for the character "1", NOT the value 1.
If the string is bigger than one character, ASC just returns the code
for the first character. To inspect other positions, we need to use
MID$:
PRINT
ASC(MID$("BBC BASIC",2,1)) |
which gives the code for the second character, "B".
As you may expect, CHR$ does the reverse of ASC: give it a number and
it will return a single character string containing the corresponding
ASCII code.
CHR$ is particularly useful for making strings out of the characters
you can't get on the standard keyboard:
PRINT
"The temperature is
21.2"+CHR$(176)+"C" |
This can be a useful technique for printing cursor control characters
or user defined characters, which are described in a later section.
If you give CHR$ a number which is bigger than 256, BASIC divides it by 256
and gives the character corresponding to the remainder.
Tip: Printing quotation marks
|
If you want to print a double quote in a string, you
can do it in two
ways, the first one involves building a string using CHR$(34), which is
the code for double quote.
Greeting$ = CHR$(34) + "Hello, world"
+ CHR$(34) |
PRINT Greeting$ |
|
|
The other way is a little trick that BBC BASIC allows us. You can
actually put the quote in the string, but you use two double quotes
together so BASIC knows that we want to print the quote character and
not end the string.
Greeting$ = """Hello, world""" |
PRINT Greeting$ |
|
|
As the quotes in this string are at the beginning and end, there are
three lots, which definitely looks odd. Take the beginning, the first
indicates the start of the string and the next two tell BASIC to store
a quote. The end is the same but in reverse.
|
VAL and STR$
The next two commands allow us to convert between numeric and string
data types.
VAL takes as its argument a string representation of a number and
returns the numeric equivalent of that number.
If the string contains non-numeric information, it will convert until
it fails:
or if the non-numeric stuff comes first, you just get 0 back.
The counterpart of VAL is STR$, which you probably guessed. You might
also have guessed that this takes a number or numeric variable and
converts it into a string representation. Now we can add a number to a
string:
REM STR$
demo |
A$ = "The temperature outside is
" + STR$(21.6) |
PRINT A$ |
END |
There are default settings which control the format of the string
produced. This is well documented in the online help and is changeable
at runtime if you require, but is a little beyond the scope of
this tutorial.
EVAL
The last string command that must be mentioned is EVAL. I'll give a
flavour of what it can do rather than a full description because it is
such a
powerful command. In essence, it allows you to evaluate the contents of
a string expression. Take the description of VAL, which
converts a string to a number. At some point programmers try,
inadvertently or otherwise, something like this:
VAL returns 22 as described above. Now try:
Not impressed? Try:
Take it from me, that's not something you get with any old BASIC. You
can pass any string expression and EVAL will evaluate it and return a
numeric or string value, just as if you had entered the code into a
line of a program. As demonstrated above, you can use internal BBC
BASIC functions (though commands like CLS etc. will not work). You can
even use variables within the program:
REM EVAL
demo |
Side1 = 3 |
Side2 = 4 |
Hyp = EVAL("SQR(Side1^2+Side2^2)") |
PRINT "Hypotenuse is: ";Hyp |
END |
The possibilities that this presents spiral off into infinity, so
that's all I'm going to say about it here.
Exercises
1) Set a string to hold the days of the week like this:
"Sun Mon TuesWed ThurFri Sat"
|
All names are 4 characters in length including a space if necessary. Given a number for a
day, use MID$ to extract the correct abbreviation for the day.
2) Set a string to hold your first name. Use MID$ and ASC to find the
ASCII codes of the letters in the name.
3) Set three strings to hold your first name, second name (if you
haven't got one, make it up) and surname. Use LEFT$ to find your
initials and concatenation to create a new string in the format "R. T.
Russell"
© Peter Nairn 2006