CodeByAkram: Regular Expression
Showing posts with label Regular Expression. Show all posts
Showing posts with label Regular Expression. Show all posts

How to use regular expressions with String methods in Java?


How to use regular expressions with String methods in Java?
regular expressions codebyakram

Strings in Java have built-in support for regular expressions. 
Strings have 4 built-in methods for regular expressions, i.e., the matches()split())replaceFirst() and replaceAll() methods.

Method
Description
s.split("regex")
Creates an array with substrings of s divided at occurrence of "regex""regex" is not included in the result.
s.replaceFirst("regex"), "replacement"
Replaces first occurrence of "regex" with replacement.
s.matches("regex")
Evaluates if "regex" matches s. Returns only true if the WHOLE string can be matched.
s.replaceAll("regex"), "replacement"
Replaces all occurrences of "regex" with replacement.

Now lets see the implementation of regex in Java String.

package com.codebyakram.regex;
public class RegexTest {
    public static final String DATA = "This is my small example "
            + "string which I'm going to " + "use for pattern matching.";

    public static void main(String[] args) {
        System.out.println(DATA.matches("\\w.*"));
        String[] splitString = (DATA.split("\\s+"));
        System.out.println(splitString.length);// should be 14
        for (String string : splitString) {
            System.out.println(string);
        }
        // replace all whitespace with tabs
        System.out.println(DATA.replaceAll("\\s+", "\t"));
    }
}
Examples
  Now let’s see another example for regex in Java.


package com.codebyakram.regex;;

public class StringMatcher {
    // returns true if the string matches exactly "true"
    public boolean isTrue(String s){
        return s.matches("true");
    }
    // returns true if the string matches exactly "true" or "True"
    public boolean isTrueVersion2(String s){
        return s.matches("[tT]rue");
    }

    // returns true if the string matches exactly "true" or "True"
    // or "yes" or "Yes"
    public boolean isTrueOrYes(String s){
        return s.matches("[tT]rue|[yY]es");
    }

    // returns true if the string contains exactly "true"
    public boolean containsTrue(String s){
        return s.matches(".*true.*");
    }


    // returns true if the string contains of three letters
    public boolean isThreeLetters(String s){
        return s.matches("[a-zA-Z]{3}");
        // simpler from for
//      return s.matches("[a-Z][a-Z][a-Z]");
    }



    // returns true if the string does not have a number at the beginning
    public boolean isNoNumberAtBeginning(String s){
        return s.matches("^[^\\d].*");
    }
    // returns true if the string contains a arbitrary number of characters except b
    public boolean isIntersection(String s){
        return s.matches("([\\w&&[^b]])*");
    }
    // returns true if the string contains a number less than 300
    public boolean isLessThenThreeHundred(String s){
        return s.matches("[^0-9]*[12]?[0-9]{1,2}[^0-9]*");
    }

}

What are the rules of writing regular expressions in Java?

What are the rules of writing regular expressions?


There are some rules for writing a regular expression or regex in java. Lets discuss about those rule. But first have a look on  What are regular expressions or regex?

Common matching symbols that used in regex

Regular Expression
Description
.
Matches any character
^regex
Finds regex that must match at the beginning of the line.
regex$
Finds regex that must match at the end of the line.
[abc]
Set definition, can match the letter a or b or c.
[abc][vz]
Set definition, can match a or b or c followed by either v or z.
[^abc]
When a caret appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.
[a-d1-7]
Ranges: matches a letter between a and d and figures from 1 to 7, but not d1.
X|Z
Finds X or Z.
XZ
Finds X directly followed by Z.
$
Checks if a line end follows.

 

 Meta characters

There are some pre-defined meta characters that are used to make certain common patterns easier to use. Let’s have a look on these characters.
Regular Expression
Description
\d
Any digit, short for [0-9]
\D
A non-digit, short for [^0-9]
\s
A whitespace character, short for [ \t\n\x0b\r\f]
\S
A non-whitespace character, short for
\w
A word character, short for [a-zA-Z_0-9]
\W
A non-word character [^\w]
\S+
Several non-whitespace characters
\b
Matches a word boundary where a word character is [a-zA-Z0-9_]


Quantifier

Quantifier defines how often an element can occur. The symbols ?, *, + and {} are qualifiers.

Regular Expression
Description
Examples
*
Occurs zero or more times, is short for {0,}
X* finds no or several letter X, <sbr /> .* finds any character sequence
+
Occurs one or more times, is short for {1,}
X+- Finds one or several letter X
?
Occurs no or one times, ? is short for {0,1}.
X? finds no or exactly one letter X
{X}
Occurs X number of times, {} describes the order of the preceding liberal
\d{3} searches for three digits, .{10} for any character sequence of length 10.
{X,Y}
Occurs between X and Y times,
\d{1,4} means \d must occur at least once and at a maximum of four.
*?
? after a quantifier makes it a reluctant quantifier. It tries to find the smallest match. This makes the regular expression stop at the first match.


Grouping and back reference
We can group parts of regular expression. In pattern we group elements with round brackets, e.g., (). This allows us to assign a repetition operator to a complete group.
In addition, these groups also create a back reference to the part of the regular expression. This captures the group. A back reference stores the part of the String which matched the group. This allows you to use this part in the replacement.
Via the $ you can refer to a group. $1 is the first group, $2 the second, etc.
Let’s, for example, assume we want to replace all whitespace between a letter followed by a point or a comma. This would involve that the point or the comma is part of the pattern. Still it should be included in the result.
// Removes whitespace between a word character and . or ,String pattern = "(\\w)(\\s+)([\\.,])";System.out.println(DATA.replaceAll(pattern, "$1$3"));

This example extracts the text between a title tag.
// Extract the text between the two title elementspattern = "(?i)(<title.*?>)(.+?)()";String updated = EXAMPLE_TEST.replaceAll(pattern, "$2");
Negative look ahead
It provides the possibility to exclude a pattern. With this we can say that a string should not be followed by another string.
Negative look ahead are defined via (?!pattern). For example, the following will match "a" if "a" is not followed by "b".
a(?!b)

Specifying modes inside the regular expression
We can add the mode modifiers to the start of the regex. To specify multiple modes, simply put them together as in (?ismx).
·       (?i) makes the regex case insensitive.
·       (?s) for "single line mode" makes the dot match all characters, including line breaks.
·       (?m) for "multi-line mode" makes the caret and dollar match at the start and end of each line in the subject string.
 Backslashes in Java
The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w, then you must be using \\w in your regex. If you want to use backslash as a literal, you have to type \\\\ as \ is also an escape character in regular expressions.


What are regular expressions or regex?

What are regular expressions or regex in Java?

Regular Expression CodeByAkram

Regular expressions define a search pattern for strings in Java. The abbreviation for regular expression is called as regex. The search pattern, it can be anything from a simple character or a fixed string or a complex expression and it can contain special characters describing the pattern. The pattern defined by the regex may match one or several times or not at all for a given string.
Regular expressions can be used to search, edit and manipulate text.
The process of analyzing or modifying a text with a regex is called: The regular expression is applied to the text/string. The pattern defined by the regex is applied on the text from left to right. Once a source character has been used in a match, it cannot be reused. For example, the regex aba will match ababababa only two times (aba_aba__).

Regex examples
A simple example for a regular expression is a (literal) string. For example, the Hello World regex matches the "Hello World" string. . (dot) is another example for a regular expression. A dot matches any single character; it would match, for example, "a" or "1".
The following tables lists several regular expressions and describes which pattern they would match.

Table 1. Regex example
Regex
Matches
this is text
Matches exactly "this is text"
this\s+is\s+text
Matches the word "this" followed by one or more whitespace characters followed by the word "is" followed by one or more whitespace characters followed by the word "text".
^\d+(\.\d+)?
^ defines that the patter must start at beginning of a new line. \d+ matches one or several digits. The ? makes the statement in brackets optional. \. matches ".", parentheses are used for grouping. Matches for example "5", "1.5" and "2.21".