Scala Regular Expressions

Scala byscala.util.matching Regex class package types to support regular expressions.The following example demonstrates the use of regular expression search wordScala:

import scala.util.matching.Regex

object Test {
   def main(args: Array[String]) {
      val pattern = "Scala".r
      val str = "Scala is Scalable and cool"
      
      println(pattern findFirstIn str)
   }
}

Implementation of the above code, the output is:

$ scalac Test.scala 
$ scala Test
Some(Scala)

Examples of use of the String class r () method to construct a Regex object.

Then use findFirstIn way to find the first match.

If you need to see all the matches can be used findAllIn methods.

You can use mkString () method to connect a regular expression matches the string, and you can use the pipe (|) to set different modes:

import scala.util.matching.Regex

object Test {
   def main(args: Array[String]) {
      val pattern = new Regex("(S|s)cala")  // 首字母可以是大写 S 或小写 s
      val str = "Scala is scalable and cool"
      
      println((pattern findAllIn str).mkString(","))   // 使用逗号 , 连接返回结果
   }
}

Implementation of the above code, the output is:

$ scalac Test.scala 
$ scala Test
Scala,scala

If you need to replace text matching specified keywords, you can usereplaceFirstIn () method to replace the first match, use replaceAllIn ()method replaces all matches, examples are as follows:

object Test {
   def main(args: Array[String]) {
      val pattern = "(S|s)cala".r
      val str = "Scala is scalable and cool"
      
      println(pattern replaceFirstIn(str, "Java"))
   }
}

Implementation of the above code, the output is:

$ scalac Test.scala 
$ scala Test
Java is scalable and cool

Regular Expressions

Scala regular expression syntax rules inherited Java, Java will use most of the rules of the Perl language.

The following table gives us some common regular expression rules:

expression Matching rule
^ Match the input string begins.
$ Match the input end of the string position.
. Matches any single character except "\ r \ n" is.
[/en.] character set. Matches any character included. For example, "[abc]" matches "plain" in the "a".
[^ /en.] Reverse character set. Matches any character not included. For example, "[^ abc]" matches "plain" in the "p", "l", "i", "n".
\\ A Match the input string start position (no multi-line support)
\\z End of the string ($ similar, but not affect the treatment options for multiple rows)
\\Z End of the string or the end of the line (from treatment on multiple-line options)
re * Repeated zero or more times
re + Repeated one or more times
re? Repeated zero or one times
re {n} Repeated n times
re {n,}
re {n, m} Repeated n to m times
a | b A match or b
(Re) Match re, and capture the text to auto-named group
(:? Re) Match re, do not capture the matching text, nor to this group assigned group number
(?> Re) Greedy subexpressions
\\ W Match letters or numbers, or underscore characters or
\\ W Not match any letters, numbers, underscores, Chinese characters
\\ S Matches any whitespace, equivalent to [\ t \ n \ r \ f]
\\ S Not match any whitespace character
\\ D Matching numbers, similar to [0-9]
\\ D Matches any non-numeric characters
\\ G The beginning of the current search
\\ N Newline
\\ B Usually a word boundary position, but if you use a character class represents backspace
\\ B Location not match the beginning or end of a word
\\ T Tabs
\\ Q Startquote: \ Q (a + b) * 3 \ E matches text "(a + b) * 3 ".
\\ E Endquote: \ Q (a + b) * 3 \ E matches text "(a + b) * 3 ".

Examples of regular expressions

Examples description
. Matches any single character except "\ r \ n" is.
[Rr] uby Match "Ruby" or "ruby"
rub [ye] Match "ruby" or "rube"
[Aeiou] Match lowercase letters: aeiou
[0-9] Matches any digit, similar to [0123456789]
[Az] Matches any ASCII lowercase letters
[AZ] Matches any ASCII uppercase
[A-zA-Z0-9] Matching numbers, uppercase and lowercase letters
[^ Aeiou] In addition to matching other characters aeiou
[^ 0-9] Matches any character other than numbers
\\ D Matching numbers, like this: [0-9]
\\ D Non-matching numbers, like this: [^ 0-9]
\\ S Match spaces, similar to: [\ t \ r \ n \ f]
\\ S Matching non-space, similar to: [^ \ t \ r \ n \ f]
\\ W Match letters, numbers, underscores, similar to: [A-Za-z0-9_]
\\ W Non-matching letters, numbers, underscores, similar to: [^ A-Za-z0-9_]
ruby? Match "rub" or "ruby": y is optional
ruby * Match "rub" plus zero or more of y.
ruby + Match "rub" plus one or more of y.
\\ D {3} Exactly matching three numbers.
\\ D {3,} Match three or more digits.
\\ D {3,5} Match three, four or five numbers.
\\ D \\ d + No Grouping: + repeating \ d
(\\ D \\ d) + / Group: + repeating \ D \ d to
([Rr] uby (,)?) + Match "Ruby", "Ruby, ruby, ruby", etc.

Note that the above table for each character uses two backslash. This is because in Java and Scala backslash in the string is the escape character. So if you want to output. \., You need to write in the string. \\. To get a backslash. See the following examples:

import scala.util.matching.Regex

object Test {
   def main(args: Array[String]) {
      val pattern = new Regex("abl[ae]\\d+")
      val str = "ablaw is able1 and cool"
      
      println((pattern findAllIn str).mkString(","))
   }
}

Implementation of the above code, the output is:

$ scalac Test.scala 
$ scala Test
able1