Home | | Internet & World Wide Web HOW TO PROGRAM | | Internet Programming | | Web Programming | String Processing and Regular Expressions - PHP

Chapter: Internet & World Wide Web HOW TO PROGRAM - Rich Internet Application Server Technologies - PHP

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

String Processing and Regular Expressions - PHP

PHP can process text easily and efficiently, enabling straightforward searching, substitution, extraction and concatenation of strings.

String Processing and Regular Expressions

 

PHP can process text easily and efficiently, enabling straightforward searching, substitution, extraction and concatenation of strings. Text manipulation is usually done with reg-ular expressions—a series of characters that serve as pattern-matching templates (or search criteria) in strings, text files and databases.

 

1. Comparing Strings

 

Many string-processing tasks can be accomplished by using the equality and comparison operators, demonstrated in Fig. 23.7. Line 14 declares and initializes array $fruits. Lines 17–36 iterate through each element in the $fruits array.

 

1     <?php print( '<?xml version = "1.0" encoding = "utf-8"?>' ) ?>

 

2     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

 

3           "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

4

5     <!-- Fig. 23.7: compare.php -->

 

6     <!-- Using the string-comparison operators. -->

 

7     <html xmlns = "http://www.w3.org/1999/xhtml">

 

8           <head>

 

9                  <title>String Comparison</title>

 

10          </head>

 

11          <body>

 

12                 <?php

 

13                       // create array fruits

 

14                       $fruits = array( "apple", "orange", "banana" );

15           

16                       // iterate through each array element

 

17                       for ( $i = 0; $i < count( $fruits ); $i++ )

{

19        // call function strcmp          to compare the array element

20        // to string "banana"            

21        if (        strcmp( $fruits[ $i ], "banana" )      < 0 )

22        print( $fruits[ $i ]       . " is less than banana " );

23        elseif ( strcmp( $fruits[ $i ], "banana" ) > 0 )

24        print( $fruits[ $i ]       . " is greater than banana " );

25        else                

26        print( $fruits[ $i ]       . " is equal to banana " );

27                                           

28        // use relational operators to compare each element

29        // to string "apple"                

                        if ( $fruits[ $i ] <         "apple" )                    

31                    print( "and less than apple! <br />" );       

32                    elseif ( $fruits[ $i       ]           > "apple" )                 

33                    print( "and greater    than apple! <br />" );           

34                    elseif ( $fruits[ $i       ]           == "apple" )              

35                    print( "and equal      to apple! <br />" );    

36        } // end for

 

37        ?><!-- end PHP script -->

 

38        </body>

 

39        </html>


Fig. 23.7 | Using the string-comparison operators.

 

Lines 21 and 23 call function strcmp to compare two strings. The function returns -1 if the first string alphabetically precedes the second string, 0 if the strings are equal, and 1 if the first string alphabetically follows the second. Lines 21–26 compare each element in the $fruits array to the string "banana", printing whether each is greater than, less than or equal to the string.

 

Relational operators (==, !=, <, <=, > and >=) can also be used to compare strings. Lines 30–35 use relational operators to compare each element of the array to the string "apple".

 

2. Regular Expressions

 

Functions ereg and preg_match use regular expressions to search a string for a specified pattern. Function ereg recognizes Portable Operating System Interface (POSIX) extend-ed regular expressions, while function preg_match provides Perl-compatible regular ex-pressions (PCRE). To use preg_match, you must install the PCRE library on your web server and add support for the library to PHP. More information on PCRE can be found at  www.pcre.org. PHP 5 supports POSIX regular expressions, so we use function ereg in this section. Figure 23.8 demonstrates regular expressions.

 

1     <?php print( '<?xml version = "1.0" encoding = "utf-8"?>' ) ?>

 

2     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

 

3           "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

4

5     <!-- Fig. 23.8: expression.php -->

 

6     <!-- Regular expressions. -->

 

7     <html xmlns = "http://www.w3.org/1999/xhtml">

 

<head>

9          <title>Regular expressions</title>

 

10        </head>

 

11        <body>

 

12        <?php

 

13        $search = "Now is the time";

 

14        print( "Test string is: '$search'<br /><br />" );

15       

16        // call ereg to search for pattern 'Now' in variable search

 

17        if ( ereg( "Now", $search ) )

 

 

18        print( "String 'Now' was found.<br />" );

19       

20        // search for pattern 'Now' in the beginning of the string

 

21        if ( ereg( "^Now", $search ) )

 

 

22        print( "String 'Now' found at beginning

23        of the line.<br />" );

24       

25        // search for pattern 'Now' at the end of the string

 

26        if ( ereg( "Now$", $search ) )

 

 

27        print( "String 'Now' was found at the end

28        of the line.<br />" );

29       

30        // search for any word ending in 'ow'

 

31        if ( ereg( "[[:<:]]([a-zA-Z]*ow)[[:>:]]", $search, $match ) )

 

                                                           

32        print( "Word   found ending            in         'ow': " .

33                    $match[ 1       ]           . "<br />"         );         

34                                                                   

35        // search for any words beginning with 't'

 

36        print( "Words beginning with 't' found: ");

37       

38        while ( eregi( "[[:<:]](t[[:alpha:]]+)[[:>:]]",

 

39        $search, $match ) )

40        {

 

           

41        print( $match[ 1 ] . " " );

42       

43        // remove the first occurrence of a word beginning

 

44        // with 't' to find other instances in the string

 

45        $search = ereg_replace( $match[ 1 ], "", $search );

 

46        } // end while

 

47        ?><!-- end PHP script -->

 

48        </body>

 

49        </html>


Fig. 23.8 | Regular expressions

 

Searching for Expressions

 

Line 13 assigns the string "Now is the time" to variable $search. The condition in line 17 calls function ereg to search for the literal characters "Now" inside variable $search. If the pattern is found, ereg returns the length of the matched string—which evaluates to true in a boolean context—and line 18 prints a message indicating that the pattern was found. We use single quotes ('') inside the string in the print statement to emphasize the search pattern. Anything enclosed in single quotes is not interpolated (unless the single quotes are nested in a double-quoted string literal, as in line 14). For example, '$name' in a print statement would output $name, not variable $name’s value.

 

Function ereg takes two arguments—a regular expression pattern to search for and the string to search. Although case mixture is often significant in patterns, PHP provides function eregi for specifying case-insensitive pattern matches.

 

Representing Patterns

 

In addition to literal characters, regular expressions can include metacharacters that spec-ify patterns. Examples of metacharacters include the ^, $ and . characters. The caret (^) metacharacter matches the beginning of a string (line 21), while the dollar sign ($) match-es the end of a string (line 26). The period (.) metacharacter matches any single character. Line 21 searches for the pattern "Now" at the beginning of $search. Line 26 searches for "Now" at the end of the string. Since the pattern is not found in this case, the body of the if statement (lines 27–28) does not execute. Note that Now$ is not a variable—it is a pat-tern that uses $ to search for the characters "Now" at the end of a string.

 

Line 31 searches (from left to right) for the first word ending with the letters ow. Bracket expressions are lists of characters enclosed in square brackets ([]) that match any single character from the list. Ranges can be specified by supplying the beginning and the end of the range separated by a dash (-). For instance, the bracket expression [a-z] matches any lowercase letter and [A-Z] matches any uppercase letter. In this example, we combine the two to create an expression that matches any letter. The special bracket expressions [[:<:]] and [[:>:]] match the beginning and end of a word, respectively.

 

The expression [a-zA-Z]*ow inside the parentheses represents any word ending in ow. The quantifier * matches the preceding pattern zero or more times. Thus, [a-zA-Z]*ow matches any number of letters followed by the literal characters ow. Quantifiers are used in regular expressions to denote how often a particular character or set of characters can appear in a match. Some PHP quantifiers are listed in Fig. 23.9.

 

Quantifier : Matches

 

{n}     Exactly n times.

{m,n} : Between m and n times, inclusive.

{n,} : n or more times.

+ : One or more times (same as {1,}).

* : Zero or more times (same as {0,}).

? : Zero or one time (same as {0,1}).

 

Fig. 23.9 | Some PHP quantifiers.

Finding Matches

 

The optional third argument to function ereg is an array that stores matches to the regular expression. When the expression is broken down into parenthetical sub-expressions, func-tion ereg stores the first encountered instance of each expression in this array, starting from the leftmost parenthesis. The first element (i.e., index 0) stores the string matched for the entire pattern. The match to the first parenthetical pattern is stored in the second array element, the second in the third array element and so on. If the parenthetical pattern is not encountered, the value of the array element remains uninitialized. Because the state-ment in line 31 is the first parenthetical pattern, Now is stored in variable $match[ 1 ] (and, because it is the only parenthetical statement in this case, it is also stored in $match[ 0 ]).

 

Searching for multiple instances of a single pattern in a string is slightly more compli-cated, because the ereg function returns only the first instance it encounters. To find mul-tiple instances of a given pattern, we must make multiple calls to ereg, and remove any matched instances before calling the function again. Lines 38–46 use a while statement and the ereg_replace function to find all the words in the string that begin with t. We’ll say more about this function momentarily.

 

Character Classes

The pattern in line 38, [[:<:]](t[[:alpha:]]+)[[:>:]], matches any word beginning with the character t followed by one or more letters. The pattern uses the character class [[:alpha:]] to recognize any letter—this is equivalent to the [a-zA-Z]. Figure 23.10 lists some character classes that can be matched with regular expressions.

 

Character classes are enclosed by the delimiters [: and :]. When this expression is placed in another set of brackets, such as [[:alpha:]] in line 38, it is a regular expression matching a single character that is a member of the class. A bracketed expression con-taining two or more adjacent character classes in the class delimiters represents those char-acter sets combined. For example, the expression [[:upper:][:lower:]]* represents all strings of uppercase and lowercase letters in any order, while [[:upper:]][[:lower:]]* matches strings with a single uppercase letter followed by any number of lowercase characters. Also, note that ([[:upper:]][[:lower:]])* is an expression for all strings that alternate between uppercase and lowercase characters (starting with uppercase and ending with lowercase).

 

Character class : Description

 

alnum : Alphanumeric characters (i.e., letters [a-zA-Z] or digits [0-9]).

alpha : Word characters (i.e., letters [a-zA-Z]).

digit : Digits.

space : White space.

lower : Lowercase letters.

upper : Uppercase letters.

 

Fig. 23.10 | Some PHP character classes

Finding Multiple Instances of a Pattern

 

The quantifier + matches one or more consecutive instances of the preceding expression. The result of the match is stored in $match[ 1 ]. Once a match is found, we print it in line 41. We then remove it from the string in line 45, using function ereg_replace. This function takes three arguments—the pattern to match, a string to replace the matched string and the string to search. The modified string is returned. Here, we search for the word that we matched with the regular expression, replace the word with an empty string, then assign the result back to $search. This allows us to match any other words beginning with the character t in the string and print them to the screen.


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail


Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.