预定义字符类

Pattern API 包含许多有用的 预定义字符类,它们为常用的正则表达式提供了方便的简写形式:

ConstructDescription
.任何字符(可能匹配行终止符,也可能不匹配行终止符)
\d一位数字:[0-9]
\D非数字:[^0-9]
\s空格字符:[ \t\n\x0B\f\r]
\S非空白字符:[^\s]
\wLiterals 字符:[a-zA-Z_0-9]
\W非 Literals 字元:[^\w]

在上表中,左栏中的每个构造都是右栏中的字符类的简写。例如,\d表示数字范围(0-9),而\w表示单词字符(任何小写字母,任何大写字母,下划线字符或任何数字)。尽可能使用 预定义的类。它们使您的代码更易于阅读,并消除了格式错误的字符类所导致的错误。

以反斜杠开头的构造称为转义构造。我们在String Literals部分中预览了转义的构造,其中提到了反斜杠以及\Q\E的引用。如果在字符串Literals 中使用转义的构造,则必须在反斜杠之前加上另一个反斜杠,才能编译字符串。例如:

private final String REGEX = "\\d"; // a single digit

在此示例中,\d是正则表达式;要编译代码,需要额外的反斜杠。测试工具直接从Console读取表达式,因此不需要多余的反斜杠。

以下示例演示了 预定义字符类的用法。

Enter your regex: .
Enter input string to search: @
I found the text "@" starting at index 0 and ending at index 1.

Enter your regex: . 
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: .
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: a
No match found.

Enter your regex: \D
Enter input string to search: 1
No match found.

Enter your regex: \D
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search:  
I found the text " " starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search: a
No match found.

Enter your regex: \S
Enter input string to search:  
No match found.

Enter your regex: \S
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: !
No match found.

Enter your regex: \W
Enter input string to search: a
No match found.

Enter your regex: \W
Enter input string to search: !
I found the text "!" starting at index 0 and ending at index 1.

在前三个示例中,正则表达式只是表示“任何字符”的.(“点”元字符)。因此,在所有三种情况下(随机选择的@字符,数字和字母),匹配均成功。其余示例均使用来自预定义字符类表的单个正则表达式构造。您可以参考此表找出每个匹配项的逻辑:

  • \d匹配所有数字

  • \s匹配空格

  • \w匹配单词字符

另外,大写字母表示相反的含义:

  • \D匹配非数字

  • \S匹配非空格

  • \W匹配非单词字符