|
Table of Content | Chapter Fifteen (Part 6) |
The UCR Standard Library for 80x86 Assembly Language Programmers provides a very rich set of string functions you may use. These routines, for the most part, are quite similar to the string functions provided in the C Standard Library. As such, these functions support zero terminated strings rather than the length prefixed strings supported by the functions in the previous sections.
Because there are so many different UCR StdLib string routines and the sources for all these routines are in the public domain (and are present on the companion CD-ROM for this text), the following sections will not discuss the implementation of each routine. Instead, the following sections will concentrate on how to use these library routines.
The UCR library often provides several variants of the same
routine. Generally a suffix of "l", "m", or "ml" appears at
the end of the name of these variant routines. The "l" suffix stands for
"literal constant". Routines with the "l" (or "ml") suffix
require two string operands. The first is generally pointed at by es:di
and
the second immediate follows the call
in the code stream.
Most StdLib string routines operate on the specified string
(or one of the strings if the function has two operands). The "m" (or
"ml") suffix instructs the string function to allocate storage on the heap
(using malloc
, hence the "m" suffix) for the new string and store
the modified result there rather than changing the source string(s). These routines always
return a pointer to the newly created string in the es:di
registers. In the
event of a memory allocation error (insufficient memory), these routines with the
"m" or "ml" suffix return the carry flag set. They return the carry
clear if the operation was successful.
These two routines delete leading spaces from a string. StrBDel
removes any leading spaces from the string pointed at by es:di
. It actually
modifies the source string. StrBDelm
makes a copy of the string on the heap
with any leading spaces removed. If there are no leading spaces, then the StrBDel
routines return the original string without modification. Note that these routines only
affect leading spaces (those appearing at the beginning of the string). They do not remove
trailing spaces and spaces in the middle of the string. See Strtrim
if you
want to remove trailing spaces. Examples:
MyString byte " Hello there, this is my string",0 MyStrPtr dword MyString . . . les di, MyStrPtr strbdelm ;Creates a new string w/o leading spaces, jc error ; pointer to string is in ES:DI on return. puts ;Print the string pointed at by ES:DI. free ;Deallocate storage allocated by strbdelm. . . . ; Note that "MyString" still contains the leading spaces. ; The following printf call will print the string along with ; those leading spaces. "strbdelm" above did not change MyString. printf byte "MyString = '%s'\n",0 dword MyString . . . les di, MyStrPtr strbdel ; Now, we really have removed the leading spaces from "MyString" printf byte "MyString = '%s'\n",0 dword MyString . . .
Output from this code fragment:
Hello there, this is my string MyString = ' Hello there, this is my string' MyString = 'Hello there, this is my string'
The strcat(xx)
routines perform string
concatenation. On entry, es:di
points at the first string, and for strcat/strcatm
dx:si
points at the second string. For strcatl
and strcatlm
the second string follows the call in the code stream. These routines create a new string
by appending the second string to the end of the first. In the case of strcat
and strcatl
, the second string is directly appended to the end of the first
string (es:di
) in memory. You must make sure there is sufficient memory at
the end of the first string to hold the appended characters. Strcatm
and strcatml
create a new string on the heap (using malloc
) holding the concatenated
result. Examples:
String1 byte "Hello ",0 byte 16 dup (0) ;Room for concatenation. String2 byte "world",0 ; The following macro loads ES:DI with the address of the ; specified operand. lesi macro operand mov di, seg operand mov es, di mov di, offset operand endm ; The following macro loads DX:SI with the address of the ; specified operand. ldxi macro operand mov dx, seg operand mov si, offset operand endm . . . lesi String1 ldxi String2 strcatm ;Create "Hello world" jc error ;If insufficient memory. print byte "strcatm: ",0 puts ;Print "Hello world" putcr free ;Deallocate string storage. . . . lesi String1 ;Create the string strcatml ; "Hello there" jc error ;If insufficient memory. byte "there",0 print byte "strcatml: ",0 puts ;Print "Hello there" putcr free . . . lesi String1 ldxi String2 strcat ;Create "Hello world" printf byte "strcat: %s\n",0 . . . ; Note: since strcat above has actually modified String1, ; the following call to strcatl appends "there" to the end ; of the string "Hello world". lesi String1 strcatl byte "there",0 printf byte "strcatl: %s\n",0 . . .
The code above produces the following output:
strcatm: Hello world strcatml: Hello there strcat: Hello world strcatl: Hello world there
Strchr searches for the first occurrence of a single
character within a string. In operation it is quite similar to the scasb
instruction.
However, you do not have to specify an explicit length when using this function as you
would for scasb
.
On entry, es:di
points at the string you want
to search through, al
contains the value to search for. On return, the carry
flag denotes success (C=1 means the character was not present in the string, C=0 means the
character was present). If the character was found in the string, cx
contains
the index into the string where strchr located the character. Note that the first
character of the string is at index zero. So strchr
will return zero if al
matches the first character of the string. If the carry flag is set, then the value in cx
has no meaning. Example:
; Note that the following string has a period at location ; "HasPeriod+24". HasPeriod byte "This string has a period.",0 . . . lesi HasPeriod ;See strcat for lesi definition. mov al, "." ;Search for a period. strchr jnc GotPeriod print byte "No period in string",cr,lf,0 jmp Done ; If we found the period, output the offset into the string: GotPeriod: print byte "Found period at offset ",0 mov ax, cx puti putcr Done:
This code fragment produces the output:
Found period at offset 24
These routines compare strings using a lexicographical
ordering. On entry to strcmp or stricmp
, es:di
points at the
first string and dx:si
points at the second string. Strcmp
compares the first string to the second and returns the result of the comparison in the
flags register. Strcmpl
operates in a similar fashion, except the second
string follows the call in the code stream. The stricmp
and stricmpl
routines differ from their counterparts in that they ignore case during the comparison.
Whereas strcmp
would return 'not equal' when comparing "Strcmp"
with "strcmp", the stricmp
(and stricmpl
) routines
would return "equal" since the only differences are upper vs. lower case. The
"i" in stricmp
and stricmpl
stands for "ignore
case." Examples:
String1 byte "Hello world", 0 String2 byte "hello world", 0 String3 byte "Hello there", 0 . . . lesi String1 ;See strcat for lesi definition. ldxi String2 ;See strcat for ldxi definition. strcmp jae IsGtrEql printf byte "%s is less than %s\n",0 dword String1, String2 jmp Tryl IsGtrEql: printf byte "%s is greater or equal to %s\n",0 dword String1, String2 Tryl: lesi String2 strcmpl byte "hi world!",0 jne NotEql printf byte "Hmmm..., %s is equal to 'hi world!'\n",0 dword String2 jmp Tryi NotEql: printf byte "%s is not equal to 'hi world!'\n",0 dword String2 Tryi: lesi String1 ldxi String2 stricmp jne BadCmp printf byte "Ignoring case, %s equals %s\n",0 dword String1, String2 jmp Tryil BadCmp: printf byte "Wow, stricmp doesn't work! %s <> %s\n",0 dword String1, String2 Tryil: lesi String2 stricmpl byte "hELLO THERE",0 jne BadCmp2 print byte "Stricmpl worked",cr,lf,0 jmp Done BadCmp2: print byte "Stricmp did not work",cr,lf,0 Done:
The strcpy
and strdup
routines
copy one string to another. There is no strcpym
or strcpyml
routines. Strdup
and strdupl
correspond to those operations. The
UCR Standard Library uses the names strdup
and strdupl
rather
than strcpym
and strcpyml
so it will use the same names as the C
standard library.
Strcpy
copies the string pointed at by es:di
to the memory locations beginning at the address in dx:si
. There is no
error checking; you must ensure that there is sufficient free space at location dx:si
before calling strcpy
. Strcpy
returns with es:di
pointing
at the destination string (that is, the original dx:si
value). Strcpyl
works in a similar fashion, except the source string follows the call.
Strdup
duplicates the string which es:di
points at and returns a pointer to the new string on the heap. Strdupl
works in a similar fashion, except the string follows the call. As usual, the carry flag
is set if there is a memory allocation error when using strdup
or strdupl
.
Examples:
String1 byte "Copy this string",0 String2 byte 32 dup (0) String3 byte 32 dup (0) StrVar1 dword 0 StrVar2 dword 0 . . . lesi String1 ;See strcat for lesi definition. ldxi String2 ;See strcat for ldxi definition. strcpy ldxi String3 strcpyl byte "This string, too!",0 lesi String1 strdup jc error ;If insufficient mem. mov word ptr StrVar1, di ;Save away ptr to mov word ptr StrVar1+2, es ; string. strdupl jc error byte "Also, this string",0 mov word ptr StrVar2, di mov word ptr StrVar2+2, es printf byte "strcpy: %s\n" byte "strcpyl: %s\n" byte "strdup: %^s\n" byte "strdupl: %^s\n",0 dword String2, String3, StrVar1, StrVar2
Strdel
and strdelm
delete characters
from a string. Strdel
deletes the specified characters within the string, strdelm
creates a new copy of the source string without the specified characters. On entry, es:di
points at the string to manipulate, cx
contains the index into the
string where the deletion is to start, and ax
contains the number of
characters to delete from the string. On return, es:di
points at the new
string (which is on the heap if you call strdelm). For strdelm
only, if the
carry flag is set on return, there was a memory allocation error. As with all UCR StdLib
string routines, the index values for the string are zero-based. That is, zero is the
index of the first character in the source string. Example:
String1 byte "Hello there, how are you?",0 . . . lesi String1 ;See strcat for lesi definition. mov cx, 5 ;Start at position five (" there") mov ax, 6 ;Delete six characters. strdelm ;Create a new string. jc error ;If insufficient memory. print byte "New string:",0 puts putcr lesi String1 mov ax, 11 mov cx, 13 strdel printf byte "Modified string: %s\n",0 dword String1
This code prints the following:
New string: Hello, how are you?
Modified string: Hello there
The strins(xx)
functions insert one string
within another. For all four routines es:di
points at the source string into
you want to insert another string. Cx
contains the insertion point (0..length
of source string). For strins
and strinsm
, dx:si
points
at the string you wish to insert. For strinsl
and strinsml
, the
string to insert appears as a literal constant in the code stream. Strins
and
strinsl
insert the second string directly into the string pointed at by es:di
.
Strinsm
and strinsml
make a copy of the source string and insert
the second string into that copy. They return a pointer to the new string in es:di
.
If there is a memory allocation error then strinsm/strinsml
sets the carry
flag on return. For strins
and strinsl
, the first string must
have sufficient storage allocated to hold the new string. Examples:
InsertInMe byte "Insert >< Here",0 byte 16 dup (0) InsertStr byte "insert this",0 StrPtr1 dword 0 StrPtr2 dword 0 . . . lesi InsertInMe ;See strcat for lesi definition. ldxi InsertStr ;See strcat for ldxi definition. mov cx, 8 ;Īnsert before "<" strinsm mov word ptr StrPtr1, di mov word ptr StrPtr1+2, es lesi InsertInMe mov cx, 8 strinsml byte "insert that",0 mov word ptr StrPtr2, di mov word ptr StrPtr2+2, es lesi InsertInMe mov cx, 8 strinsl byte " ",0 ;Two spaces lesi InsertInMe ldxi InsertStr mov cx, 9 ;In front of first space from above. strins printf byte "First string: %^s\n" byte "Second string: %^s\n" byte "Third string: %s\n",0 dword StrPtr1, StrPtr2, InsertInMe
Note that the strins
and strinsl
operations above both insert strings into the same destination string. The output from the
above code is
First string: Insert >insert this< here Second string: Insert >insert that< here Third string: Insert > insert this < here
Strlen
computes the length of the string pointed at
by es:di
. It returns the number of characters up to, but not including, the
zero terminating byte. It returns this length in the cx
register. Example:
GetLen byte "This string is 33 characters long",0 . . . lesi GetLen ;See strcat for lesi definition. strlen print byte "The string is ",0 mov ax, cx ;Puti needs the length in AX! puti print byte " characters long",cr,lf,0
Strlwr
and Strlwrm
convert any upper
case characters in a string to lower case. Strupr
and Struprm
convert any lower case characters in a string to upper case. These routines do not affect
any other characters present in the string. For all four routines, es:di
points
at the source string to convert. Strlwr
and strupr
modify the
characters directly in that string. Strlwrm
and struprm
make a
copy of the string to the heap and then convert the characters in the new string. They
also return a pointer to this new string in es:di
. As usual for UCR StdLib
routines, strlwrm
and struprm
return the carry flag set if there
is a memory allocation error. Examples:
String1 byte "This string has lower case.",0 String2 byte "THIS STRING has Upper Case.",0 StrPtr1 dword 0 StrPtr2 dword 0 . . . lesi String1 ;See strcat for lesi definition. struprm ;Convert lower case to upper case. jc error mov word ptr StrPtr1, di mov word ptr StrPtr1+2, es lesi String2 strlwrm ;Convert upper case to lower case. jc error mov word ptr StrPtr2, di mov word ptr StrPtr2+2, es lesi String1 strlwr ;Convert to lower case, in place. lesi String2 strupr ;Convert to upper case, in place. printf byte "struprm: %^s\n" byte "strlwrm: %^s\n" byte "strlwr: %s\n" byte "strupr: %s\n",0 dword StrPtr1, StrPtr2, String1, String2
The above code fragment prints the following:
struprm: THIS STRING HAS LOWER CASE strlwrm: this string has upper case strlwr: this string has lower case strupr: THIS STRING HAS UPPER CASE
These two routines reverse the characters in a string. For
example, if you pass strrev
the string "ABCDEF" it will convert
that string to "FEDCBA". As you'd expect by now, the strrev
routine
reverse the string whose address you pass in es:di
; strrevm
first makes a copy of the string on the heap and reverses those characters leaving the
original string unchanged. Of course strrevm
will return the carry flag set
if there was a memory allocation error. Example:
Palindrome byte "radar",0 NotPaldrm byte "x + y - z",0 StrPtr1 dword 0 . . . lesi Palindrome ;See strcat for lesi definition. strrevm jc error mov word ptr StrPtr1, di mov word ptr StrPtr1+2, es lesi NotPaldrm strrev printf byte "First string: %^s\n" byte "Second string: %s\n",0 dword StrPtr1, NotPaldrm
The above code produces the following output:
First string: radar Second string: z - y + x
Strset
and strsetm
replicate a single
character through a string. Their behavior, however, is not quite the same. In particular,
while strsetm
is quite similar to the repeat function (see "Repeat" on page 840), strset
is
not. Both routines expect a single character value in the al
register. They
will replicate this character throughout some string. Strsetm
also requires a
count in the cx
register. It creates a string on the heap consisting of cx
characters and returns a pointer to this string in es:di
(assuming no memory
allocation error). Strset
, on the other hand, expects you to pass it the
address of an existing string in es:di
. It will replace each character in
that string with the character in al
. Note that you do not specify a length
when using the strset
function, strset uses the length of the existing
string. Example:
String1 byte "Hello there",0 . . . lesi String1 ;See strcat for lesi definition. mov al, '*' strset mov cx, 8 mov al, '#' strsetm print byte "String2: ",0 puts printf byte "\nString1: %s\n",0 dword String1
The above code produces the output:
String2: ######## String1: ***********
These four routines search through a string for a character
which is either in some specified character set (strspan
, strspanl
)
or not a member of some character set (strcspan
, strcspanl
).
These routines appear in the UCR Standard Library only because of their appearance in the
C standard library. You should rarely use these routines. The UCR Standard Library
includes some other routines for manipulating character sets and performing character
matching operations. Nonetheless, these routines are somewhat useful on occasion and are
worth a mention here.
These routines expect you to pass them the addresses of two
strings: a source string and a character set string. They expect the address of the source
string in es:di
. Strspan
and strcspan
want the
address of the character set string in dx:si
; the character set string
follows the call with strspanl
and strcspanl
. On return, cx
contains an index into the string, defined as follows:
strspan, strspanl:
Index of first character in
source found in the character set.
strcspan, strcspanl:
Index of first character
in source not found in the character set.
If all the characters are in the set (or are not in the
set) then cx
contains the index into the string of the zero terminating byte.
Example:
Source byte "ABCDEFG 0123456",0 Set1 byte "ABCDEFGHIJKLMNOPQRSTUVWXYZ",0 Set2 byte "0123456789",0 Index1 word ? Index2 word ? Index3 word ? Index4 word ? . . . lesi Source ;See strcat for lesi definition. ldxi Set1 ;See strcat for ldxi definition. strspan ;Search for first ALPHA char. mov Index1, cx ;Index of first alphabetic char. lesi Source lesi Set2 strspan ;Search for first numeric char. mov Index2, cx lesi Source strcspanl byte "ABCDEFGHIJKLMNOPQRSTUVWXYZ",0 mov Index3, cx lesi Set2 strcspnl byte "0123456789",0 mov Index4, cx printf byte "First alpha char in Source is at offset %d\n" byte "First numeric char is at offset %d\n" byte "First non-alpha in Source is at offset %d\n" byte "First non-numeric in Set2 is at offset %d\n",0 dword Index1, Index2, Index3, Index4
This code outputs the following:
First alpha char in Source is at offset 0 First numeric char is at offset 8 First non-alpha in Source is at offset 7 First non-numeric in Set2 is at offset 10
Strstr
searches for the first occurrence of one
string within another. es:di
contains the address of the string in which you
want to search for a second string. dx:si
contains the address of the second
string for the strstr
routine; for strstrl
the search second
string immediately follows the call in the code stream.
On return from strstr
or strstrl
,
the carry flag will be set if the second string is not present in the source string. If
the carry flag is clear, then the second string is present in the source string and cx
will contain the (zero-based) index where the second string was found. Example:
SourceStr byte "Search for 'this' in this string",0 SearchStr byte "this",0 . . . lesi SourceStr ;See strcat for lesi definition. ldxi SearchStr ;See strcat for ldxi definition. strstr jc NotPresent print byte "Found string at offset ",0 mov ax, cx ;Need offset in AX for puti puti putcr lesi SourceStr strstrl byte "for",0 jc NotPresent print byte "Found 'for' at offset ",0 mov ax, cx puti putcr NotPresent:
The above code prints the following:
Found string at offset 12 Found 'for' at offset 7
These two routines are quite similar to strbdel
and strbdelm
. Rather than removing leading spaces, however, they trim off any
trailing spaces from a string. Strtrim
trims off any trailing spaces directly
on the specified string in memory. Strtrimm
first copies the source string
and then trims and space off the copy. Both routines expect you to pass the address of the
source string in es:di
. Strtrimm
returns a pointer to the new
string (if it could allocate it) in es:di
. It also returns the carry set or
clear to denote error/no error. Example:
String1 byte "Spaces at the end ",0 String2 byte " Spaces on both sides ",0 StrPtr1 dword 0 StrPtr2 dword 0 . . . ; TrimSpcs trims the spaces off both ends of a string. ; Note that it is a little more efficient to perform the ; strbdel first, then the strtrim. This routine creates ; the new string on the heap and returns a pointer to this ; string in ES:DI. TrimSpcs proc strbdelm jc BadAlloc ;Just return if error. strtrim clc BadAlloc: ret TrimSpcs endp . . . lesi String1 ;See strcat for lesi definition. strtrimm jc error mov word ptr StrPtr1, di mov word ptr StrPtr1+2, es lesi String2 call TrimSpcs jc error mov word ptr StrPtr2, di mov word ptr StrPtr2+2, es printf byte "First string: '%s'\n" byte "Second string: '%s'\n",0 dword StrPtr1, StrPtr2
This code fragment outputs the following:
First string: 'Spaces at the end' Second string: 'Spaces on both sides'
In addition to the "strxxx
" routines
listed in this section, there are many additional string routines available in the UCR
Standard Library. Routines to convert from numeric types (integer, hex, real, etc.) to a
string or vice versa, pattern matching and character set routines, and many other
conversion and string utilities. The routines described in this chapter are those whose
definitions appear in the "strings.a" header file and are specifically targeted
towards generic string manipulation. For more details on the other string routines,
consult the UCR Standard Library reference section in the appendices.
|
Table of Content | Chapter Fifteen (Part 6) |
Chapter Fifteen: Strings And
Character Sets (Part 5)
28 SEP 1996