CHAPTER FIFTEEN: STRINGS AND CHARACTER SETS (Part 5)

The Art of ASSEMBLY LANGUAGE PROGRAMMING

Chapter Fifteen (Part 4)	Table of Content	Chapter Fifteen (Part 6)


CHAPTER FIFTEEN: STRINGS AND CHARACTER SETS (Part 5)

15.4 - String Functions in the UCR Standard Library 15.4.1 - StrBDel, StrBDelm 15.4.2 - Strcat, Strcatl, Strcatm, Strcatml 15.4.3 - Strchr 15.4.4 - Strcmp, Strcmpl, Stricmp, Stricmpl 15.4.5 - Strcpy, Strcpyl, Strdup, Strdupl 15.4.6 - Strdel, Strdelm 15.4.7 - Strins, Strinsl, Strinsm, Strinsml	15.4.8 - Strlen 15.4.9 - Strlwr, Strlwrm, Strupr, Struprm 15.4.10 - Strrev, Strrevm 15.4.11 - Strset, Strsetm 15.4.12 - Strspan, Strspanl, Strcspan, Strcspanl 15.4.13 - Strstr, Strstrl 15.4.14 - Strtrim, Strtrimm 15.4.15 - Other String Routines in the UCR Standard Library

15.4 String Functions in the UCR Standard Library

The UCR Standard Library for 80x86 Assembly Language Programmers provides a very rich set of string functions you may use. These routines, for the most part, are quite similar to the string functions provided in the C Standard Library. As such, these functions support zero terminated strings rather than the length prefixed strings supported by the functions in the previous sections.

Because there are so many different UCR StdLib string routines and the sources for all these routines are in the public domain (and are present on the companion CD-ROM for this text), the following sections will not discuss the implementation of each routine. Instead, the following sections will concentrate on how to use these library routines.

The UCR library often provides several variants of the same routine. Generally a suffix of "l", "m", or "ml" appears at the end of the name of these variant routines. The "l" suffix stands for "literal constant". Routines with the "l" (or "ml") suffix require two string operands. The first is generally pointed at by es:di and the second immediate follows the call in the code stream.

Most StdLib string routines operate on the specified string (or one of the strings if the function has two operands). The "m" (or "ml") suffix instructs the string function to allocate storage on the heap (using malloc, hence the "m" suffix) for the new string and store the modified result there rather than changing the source string(s). These routines always return a pointer to the newly created string in the es:di registers. In the event of a memory allocation error (insufficient memory), these routines with the "m" or "ml" suffix return the carry flag set. They return the carry clear if the operation was successful.

15.4.1 StrBDel, StrBDelm

These two routines delete leading spaces from a string. StrBDel removes any leading spaces from the string pointed at by es:di. It actually modifies the source string. StrBDelm makes a copy of the string on the heap with any leading spaces removed. If there are no leading spaces, then the StrBDel routines return the original string without modification. Note that these routines only affect leading spaces (those appearing at the beginning of the string). They do not remove trailing spaces and spaces in the middle of the string. See Strtrim if you want to remove trailing spaces. Examples:

MyString        byte    "    Hello there, this is my string",0
MyStrPtr        dword   MyString
                 .
                 .
                 .
                les     di, MyStrPtr
                strbdelm            ;Creates a new string w/o leading spaces,
                jc      error       ; pointer to string is in ES:DI on return.
                puts                ;Print the string pointed at by ES:DI.
                free                ;Deallocate storage allocated by strbdelm.
                 .
                 .
                 .
; Note that "MyString" still contains the leading spaces.
; The following printf call will print the string along with
; those leading spaces. "strbdelm" above did not change MyString.

                printf
                byte    "MyString = '%s'\n",0
                dword   MyString
                 .
                 .
                 .
                les     di, MyStrPtr
                strbdel

; Now, we really have removed the leading spaces from "MyString"

                printf
                byte    "MyString = '%s'\n",0
                dword   MyString
                 .
                 .
                 .

Output from this code fragment:

Hello there, this is my string
MyString = '   Hello there, this is my string'
MyString = 'Hello there, this is my string'

15.4.2 Strcat, Strcatl, Strcatm, Strcatml

The strcat(xx) routines perform string concatenation. On entry, es:di points at the first string, and for strcat/strcatm dx:si points at the second string. For strcatl and strcatlm the second string follows the call in the code stream. These routines create a new string by appending the second string to the end of the first. In the case of strcat and strcatl, the second string is directly appended to the end of the first string (es:di) in memory. You must make sure there is sufficient memory at the end of the first string to hold the appended characters. Strcatm and strcatml create a new string on the heap (using malloc) holding the concatenated result. Examples:

String1         byte    "Hello ",0
                byte    16 dup (0)              ;Room for concatenation.

String2         byte    "world",0

; The following macro loads ES:DI with the address of the
; specified operand.

lesi            macro   operand
                mov     di, seg operand
                mov     es, di
                mov     di, offset operand
                endm

; The following macro loads DX:SI with the address of the
; specified operand.

ldxi            macro   operand
                mov     dx, seg operand
                mov     si, offset operand
                endm
                 .
                 .
                 .
                lesi    String1
                ldxi    String2
                strcatm                 ;Create "Hello world"
                jc      error           ;If insufficient memory.
                print
                byte    "strcatm: ",0
                puts                    ;Print "Hello world"
                putcr
                free                    ;Deallocate string storage.
                 .
                 .
                 .
                lesi    String1         ;Create the string
                strcatml                        ; "Hello there"
                jc      error           ;If insufficient memory.
                byte    "there",0
                print
                byte    "strcatml: ",0
                puts                    ;Print "Hello there"
                putcr
                free
                 .
                 .
                 .
                lesi    String1
                ldxi    String2
                strcat                  ;Create "Hello world"
                printf
                byte    "strcat: %s\n",0
                 .
                 .
                 .
; Note: since strcat above has actually modified String1,
; the following call to strcatl appends "there" to the end
; of the string "Hello world".

                lesi    String1
                strcatl
                byte    "there",0
                printf
                byte    "strcatl: %s\n",0
                 .
                 .
                 .

The code above produces the following output:

strcatm: Hello world
strcatml: Hello there
strcat: Hello world
strcatl: Hello world there

15.4.3 Strchr

Strchr searches for the first occurrence of a single character within a string. In operation it is quite similar to the scasb instruction. However, you do not have to specify an explicit length when using this function as you would for scasb.

On entry, es:di points at the string you want to search through, al contains the value to search for. On return, the carry flag denotes success (C=1 means the character was not present in the string, C=0 means the character was present). If the character was found in the string, cx contains the index into the string where strchr located the character. Note that the first character of the string is at index zero. So strchr will return zero if al matches the first character of the string. If the carry flag is set, then the value in cxhas no meaning. Example:

; Note that the following string has a period at location
; "HasPeriod+24". 

HasPeriod       byte    "This string has a period.",0
                 .
                 .
                 .
                lesi    HasPeriod       ;See strcat for lesi definition.
                mov     al, "."         ;Search for a period.
                strchr
                jnc     GotPeriod
                print
                byte    "No period in string",cr,lf,0
                jmp     Done

; If we found the period, output the offset into the string:

GotPeriod:      print
                byte    "Found period at offset ",0
                mov     ax, cx
                puti
                putcr
Done:

This code fragment produces the output:

Found period at offset 24

15.4.4 Strcmp, Strcmpl, Stricmp, Stricmpl

These routines compare strings using a lexicographical ordering. On entry to strcmp or stricmp, es:di points at the first string and dx:si points at the second string. Strcmp compares the first string to the second and returns the result of the comparison in the flags register. Strcmpl operates in a similar fashion, except the second string follows the call in the code stream. The stricmp and stricmpl routines differ from their counterparts in that they ignore case during the comparison. Whereas strcmp would return 'not equal' when comparing "Strcmp" with "strcmp", the stricmp (and stricmpl) routines would return "equal" since the only differences are upper vs. lower case. The "i" in stricmp and stricmpl stands for "ignore case." Examples:

String1         byte    "Hello world", 0
String2         byte    "hello world", 0
String3         byte    "Hello there", 0
                 .
                 .
                 .
                lesi    String1         ;See strcat for lesi definition.
                ldxi    String2         ;See strcat for ldxi definition.
                strcmp
                jae     IsGtrEql
                printf
                byte    "%s is less than %s\n",0
                dword   String1, String2
                jmp     Tryl

IsGtrEql:               printf
                byte    "%s is greater or equal to %s\n",0
                dword   String1, String2

Tryl:           lesi    String2
                strcmpl
                byte    "hi world!",0
                jne     NotEql
                printf
                byte    "Hmmm..., %s is equal to 'hi world!'\n",0
                dword   String2
                jmp     Tryi

NotEql:         printf
                byte    "%s is not equal to 'hi world!'\n",0
                dword   String2

Tryi:           lesi    String1
                ldxi    String2
                stricmp
                jne     BadCmp
                printf
                byte    "Ignoring case, %s equals %s\n",0
                dword   String1, String2
                jmp     Tryil

BadCmp:         printf
                byte    "Wow, stricmp doesn't work! %s <> %s\n",0
                dword   String1, String2

Tryil:          lesi    String2
                stricmpl
                byte    "hELLO THERE",0
                jne     BadCmp2
                print
                byte    "Stricmpl worked",cr,lf,0
                jmp     Done

BadCmp2:        print
                byte    "Stricmp did not work",cr,lf,0

Done:

15.4.5 Strcpy, Strcpyl, Strdup, Strdupl

The strcpy and strdup routines copy one string to another. There is no strcpym or strcpyml routines. Strdup and strdupl correspond to those operations. The UCR Standard Library uses the names strdup and strdupl rather than strcpym and strcpyml so it will use the same names as the C standard library.

Strcpy copies the string pointed at by es:dito the memory locations beginning at the address in dx:si. There is no error checking; you must ensure that there is sufficient free space at location dx:sibefore calling strcpy. Strcpy returns with es:di pointing at the destination string (that is, the original dx:si value). Strcpyl works in a similar fashion, except the source string follows the call.

Strdup duplicates the string which es:dipoints at and returns a pointer to the new string on the heap. Strdupl works in a similar fashion, except the string follows the call. As usual, the carry flag is set if there is a memory allocation error when using strdup or strdupl. Examples:

String1         byte            "Copy this string",0
String2         byte            32 dup (0)
String3         byte            32 dup (0)
StrVar1         dword           0
StrVar2         dword           0
                 .
                 .
                 .
                lesi    String1         ;See strcat for lesi definition.
                ldxi    String2         ;See strcat for ldxi definition.
                strcpy

                ldxi    String3
                strcpyl
                byte    "This string, too!",0

                lesi    String1
                strdup
                jc      error                   ;If insufficient mem.
                mov     word ptr StrVar1, di    ;Save away ptr to
                mov     word ptr StrVar1+2, es  ; string.

                strdupl
                jc      error
                byte    "Also, this string",0
                mov     word ptr StrVar2, di
                mov     word ptr StrVar2+2, es

                printf
                byte    "strcpy: %s\n"
                byte    "strcpyl: %s\n"
                byte    "strdup: %^s\n"
                byte    "strdupl: %^s\n",0
                dword   String2, String3, StrVar1, StrVar2

15.4.6 Strdel, Strdelm

Strdel and strdelm delete characters from a string. Strdel deletes the specified characters within the string, strdelm creates a new copy of the source string without the specified characters. On entry, es:dipoints at the string to manipulate, cx contains the index into the string where the deletion is to start, and ax contains the number of characters to delete from the string. On return, es:di points at the new string (which is on the heap if you call strdelm). For strdelm only, if the carry flag is set on return, there was a memory allocation error. As with all UCR StdLib string routines, the index values for the string are zero-based. That is, zero is the index of the first character in the source string. Example:

String1         byte    "Hello there, how are you?",0
                 .
                 .
                 .
                lesi    String1         ;See strcat for lesi definition.
                mov     cx, 5           ;Start at position five (" there")
                mov     ax, 6           ;Delete six characters.
                strdelm                 ;Create a new string.
                jc      error           ;If insufficient memory.
                print
                byte    "New string:",0
                puts
                putcr

                lesi    String1
                mov     ax, 11
                mov     cx, 13
                strdel
                printf
                byte    "Modified string: %s\n",0
                dword   String1

This code prints the following:

New string: Hello, how are you?
Modified string: Hello there

15.4.7 Strins, Strinsl, Strinsm, Strinsml

The strins(xx) functions insert one string within another. For all four routines es:di points at the source string into you want to insert another string. Cx contains the insertion point (0..length of source string). For strins and strinsm, dx:si points at the string you wish to insert. For strinsl and strinsml, the string to insert appears as a literal constant in the code stream. Strins and strinsl insert the second string directly into the string pointed at by es:di. Strinsm and strinsml make a copy of the source string and insert the second string into that copy. They return a pointer to the new string in es:di. If there is a memory allocation error then strinsm/strinsml sets the carry flag on return. For strins and strinsl, the first string must have sufficient storage allocated to hold the new string. Examples:

InsertInMe      byte    "Insert >< Here",0
                byte    16 dup (0)
InsertStr       byte    "insert this",0
StrPtr1         dword   0
StrPtr2         dword   0
                 .
                 .
                 .
                lesi    InsertInMe      ;See strcat for lesi definition.
                ldxi    InsertStr       ;See strcat for ldxi definition.
                mov     cx, 8           ;Însert before "<"
                strinsm
                mov     word ptr StrPtr1, di
                mov     word ptr StrPtr1+2, es

                lesi    InsertInMe
                mov     cx, 8
                strinsml
                byte    "insert that",0
                mov     word ptr StrPtr2, di
                mov     word ptr StrPtr2+2, es

                lesi    InsertInMe
                mov     cx, 8
                strinsl
                byte    " ",0           ;Two spaces

                lesi    InsertInMe
                ldxi    InsertStr
                mov     cx, 9           ;In front of first space from above.
                strins

                printf
                byte    "First string: %^s\n"
                byte    "Second string: %^s\n"
                byte    "Third string: %s\n",0
                dword   StrPtr1, StrPtr2, InsertInMe

Note that the strins and strinsl operations above both insert strings into the same destination string. The output from the above code is

First string: Insert >insert this< here
Second string: Insert >insert that< here
Third string: Insert > insert this < here

15.4.8 Strlen

Strlen computes the length of the string pointed at by es:di. It returns the number of characters up to, but not including, the zero terminating byte. It returns this length in the cx register. Example:

GetLen          byte    "This string is 33 characters long",0
                 .
                 .
                 .
                lesi    GetLen          ;See strcat for lesi definition.
                strlen
                print
                byte    "The string is ",0
                mov     ax, cx          ;Puti needs the length in AX!
                puti
                print
                byte    " characters long",cr,lf,0

15.4.9 Strlwr, Strlwrm, Strupr, Struprm

Strlwr and Strlwrm convert any upper case characters in a string to lower case. Strupr and Struprm convert any lower case characters in a string to upper case. These routines do not affect any other characters present in the string. For all four routines, es:di points at the source string to convert. Strlwr and strupr modify the characters directly in that string. Strlwrm and struprm make a copy of the string to the heap and then convert the characters in the new string. They also return a pointer to this new string in es:di. As usual for UCR StdLib routines, strlwrm and struprm return the carry flag set if there is a memory allocation error. Examples:

String1         byte    "This string has lower case.",0
String2         byte    "THIS STRING has Upper Case.",0
StrPtr1         dword   0
StrPtr2         dword   0
                 .
                 .
                 .
                lesi    String1         ;See strcat for lesi definition.
                struprm                 ;Convert lower case to upper case.
                jc      error
                mov     word ptr StrPtr1, di
                mov     word ptr StrPtr1+2, es

                lesi    String2
                strlwrm                 ;Convert upper case to lower case.
                jc      error
                mov     word ptr StrPtr2, di
                mov     word ptr StrPtr2+2, es

                lesi    String1
                strlwr                  ;Convert to lower case, in place.

                lesi    String2
                strupr                  ;Convert to upper case, in place.

                printf
                byte    "struprm: %^s\n"
                byte    "strlwrm: %^s\n"
                byte    "strlwr: %s\n"
                byte    "strupr: %s\n",0
                dword   StrPtr1, StrPtr2, String1, String2

The above code fragment prints the following:

struprm: THIS STRING HAS LOWER CASE
strlwrm: this string has upper case
strlwr: this string has lower case
strupr: THIS STRING HAS UPPER CASE

15.4.10 Strrev, Strrevm

These two routines reverse the characters in a string. For example, if you pass strrev the string "ABCDEF" it will convert that string to "FEDCBA". As you'd expect by now, the strrev routine reverse the string whose address you pass in es:di; strrevm first makes a copy of the string on the heap and reverses those characters leaving the original string unchanged. Of course strrevm will return the carry flag set if there was a memory allocation error. Example:

Palindrome      byte    "radar",0
NotPaldrm       byte    "x + y - z",0
StrPtr1         dword   0
                 .
                 .
                 .
                lesi    Palindrome      ;See strcat for lesi definition.
                strrevm
                jc      error
                mov     word ptr StrPtr1, di
                mov     word ptr StrPtr1+2, es

                lesi    NotPaldrm
                strrev

                printf
                byte    "First string: %^s\n"
                byte    "Second string: %s\n",0
                dword   StrPtr1, NotPaldrm

The above code produces the following output:

First string: radar
Second string: z - y + x

15.4.11 Strset, Strsetm

Strset and strsetm replicate a single character through a string. Their behavior, however, is not quite the same. In particular, while strsetm is quite similar to the repeat function (see "Repeat" on page 840), strset is not. Both routines expect a single character value in the al register. They will replicate this character throughout some string. Strsetm also requires a count in the cx register. It creates a string on the heap consisting of cx characters and returns a pointer to this string in es:di (assuming no memory allocation error). Strset, on the other hand, expects you to pass it the address of an existing string in es:di. It will replace each character in that string with the character in al. Note that you do not specify a length when using the strset function, strset uses the length of the existing string. Example:

String1         byte    "Hello there",0
                 .
                 .
                 .
                lesi    String1         ;See strcat for lesi definition.
                mov     al, '*'
                strset

                mov     cx, 8
                mov     al, '#'
                strsetm

                print
                byte    "String2: ",0
                puts
                printf
                byte    "\nString1: %s\n",0
                dword   String1

The above code produces the output:

String2: ########
String1: ***********

15.4.12 Strspan, Strspanl, Strcspan, Strcspanl

These four routines search through a string for a character which is either in some specified character set (strspan, strspanl) or not a member of some character set (strcspan, strcspanl). These routines appear in the UCR Standard Library only because of their appearance in the C standard library. You should rarely use these routines. The UCR Standard Library includes some other routines for manipulating character sets and performing character matching operations. Nonetheless, these routines are somewhat useful on occasion and are worth a mention here.

These routines expect you to pass them the addresses of two strings: a source string and a character set string. They expect the address of the source string in es:di. Strspan and strcspan want the address of the character set string in dx:si; the character set string follows the call with strspanl and strcspanl. On return, cxcontains an index into the string, defined as follows:

strspan, strspanl: Index of first character in source found in the character set.

strcspan, strcspanl: Index of first character in source not found in the character set.

If all the characters are in the set (or are not in the set) then cx contains the index into the string of the zero terminating byte.

Example:

Source          byte    "ABCDEFG 0123456",0
Set1            byte    "ABCDEFGHIJKLMNOPQRSTUVWXYZ",0
Set2            byte    "0123456789",0
Index1          word    ?
Index2          word    ?
Index3          word    ?
Index4          word    ?
                 .
                 .
                 .
                lesi    Source          ;See strcat for lesi definition.
                ldxi    Set1            ;See strcat for ldxi definition.
                strspan                 ;Search for first ALPHA char.
                mov     Index1, cx      ;Index of first alphabetic char.

                lesi    Source
                lesi    Set2
                strspan                 ;Search for first numeric char.
                mov     Index2, cx

                lesi    Source
                strcspanl
                byte    "ABCDEFGHIJKLMNOPQRSTUVWXYZ",0
                mov     Index3, cx

                lesi    Set2
                strcspnl
                byte    "0123456789",0
                mov     Index4, cx

                printf
                byte    "First alpha char in Source is at offset %d\n"
                byte    "First numeric char is at offset %d\n"
                byte    "First non-alpha in Source is at offset %d\n"
                byte    "First non-numeric in Set2 is at offset %d\n",0
                dword   Index1, Index2, Index3, Index4

This code outputs the following:

First alpha char in Source is at offset 0
First numeric char is at offset 8
First non-alpha in Source is at offset 7
First non-numeric in Set2 is at offset 10

15.4.13 Strstr, Strstrl

Strstr searches for the first occurrence of one string within another. es:di contains the address of the string in which you want to search for a second string. dx:si contains the address of the second string for the strstr routine; for strstrl the search second string immediately follows the call in the code stream.

On return from strstr or strstrl, the carry flag will be set if the second string is not present in the source string. If the carry flag is clear, then the second string is present in the source string and cxwill contain the (zero-based) index where the second string was found. Example:

SourceStr       byte    "Search for 'this' in this string",0
SearchStr       byte    "this",0
                 .
                 .
                 .
                lesi    SourceStr       ;See strcat for lesi definition.
                ldxi    SearchStr       ;See strcat for ldxi definition.
                strstr
                jc      NotPresent
                print
                byte    "Found string at offset ",0
                mov     ax, cx          ;Need offset in AX for puti
                puti
                putcr

                lesi    SourceStr
                strstrl
                byte    "for",0
                jc      NotPresent
                print
                byte    "Found 'for' at offset ",0
                mov     ax, cx
                puti
                putcr
NotPresent:

The above code prints the following:

Found string at offset 12
Found 'for' at offset 7

15.4.14 Strtrim, Strtrimm

These two routines are quite similar to strbdel and strbdelm. Rather than removing leading spaces, however, they trim off any trailing spaces from a string. Strtrim trims off any trailing spaces directly on the specified string in memory. Strtrimm first copies the source string and then trims and space off the copy. Both routines expect you to pass the address of the source string in es:di. Strtrimm returns a pointer to the new string (if it could allocate it) in es:di. It also returns the carry set or clear to denote error/no error. Example:

String1         byte    "Spaces at the end      ",0
String2         byte    "    Spaces on both sides     ",0
StrPtr1         dword   0
StrPtr2         dword   0
                 .
                 .
                 .

; TrimSpcs trims the spaces off both ends of a string.
; Note that it is a little more efficient to perform the
; strbdel first, then the strtrim. This routine creates
; the new string on the heap and returns a pointer to this
; string in ES:DI.

TrimSpcs        proc
                strbdelm
                jc      BadAlloc        ;Just return if error.
                strtrim
                clc
BadAlloc:       ret
TrimSpcs        endp
                         .
                         .
                         .
                lesi    String1         ;See strcat for lesi definition.
                strtrimm
                jc      error
                mov     word ptr StrPtr1, di
                mov     word ptr StrPtr1+2, es

                lesi    String2
                call    TrimSpcs
                jc      error
                mov     word ptr StrPtr2, di
                mov     word ptr StrPtr2+2, es

                printf
                byte    "First string: '%s'\n"
                byte    "Second string: '%s'\n",0
                dword   StrPtr1, StrPtr2

This code fragment outputs the following:

First string: 'Spaces at the end'
Second string: 'Spaces on both sides'

15.4.15 Other String Routines in the UCR Standard Library

In addition to the "strxxx" routines listed in this section, there are many additional string routines available in the UCR Standard Library. Routines to convert from numeric types (integer, hex, real, etc.) to a string or vice versa, pattern matching and character set routines, and many other conversion and string utilities. The routines described in this chapter are those whose definitions appear in the "strings.a" header file and are specifically targeted towards generic string manipulation. For more details on the other string routines, consult the UCR Standard Library reference section in the appendices.


Chapter Fifteen (Part 4)	Table of Content	Chapter Fifteen (Part 6)

Chapter Fifteen: Strings And Character Sets (Part 5)
28 SEP 1996