Bash字符串处理（与Java对照） - 21.字符串正则匹配

ligaoyuan00

2011-10-24

关注关注

Bash字符串处理（与Java对照） - 21.字符串正则匹配

In Java

正则表达式查询

String.matches方法

boolean matches(String regex)

通知此字符串是否匹配给定的正则表达式。

String str = "123456";
String re = "\\d+";
if (str.matches(re)) {
    // do something
}

Pattern类和Matcher类

String str = "abc efg ABC";
String re = "a|f"; //表示a或f
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(str);
boolean rs = m.find();

如果str中有re，那么rs为true，否则为flase。如果想在查找时忽略大小写，则可以写成Pattern p = Pattern.compile(re, Pattern.CASE_INSENSITIVE);

正则表达式提取

String re = ".+\\(.+)$";
String str = "c:\\dir1\\dir2\\name.txt";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(str);
boolean rs = m.find();
for (int i = 1; i <= m.groupCount(); i++) {
    System.out.println(m.group(i));
}

以上的执行结果为name.txt，提取的字符串储存在m.group(i)中，其中i最大值为m.groupCount();

正则表达式分割

String re = "::";
Pattern p = Pattern.compile(re);
String[] r = p.split("xd::abc::cde");

执行后，r就是{"xd","abc","cde"}，其实分割时还有跟简单的方法：

String str="xd::abc::cde";
String[] r = str.split("::");

正则表达式替换（删除）

String re = "a+"; //表示一个或多个a
Pattern p = Pattern.compile(re);
Matcher m = p.matcher("aaabbced a ccdeaa");
String s = m.replaceAll("A");

结果为"Abbced A ccdeA"

如果写成空串，既可达到删除的功能，比如：

String re = "a+"; //表示一个或多个a
Pattern p = Pattern.compile(re);
Matcher m = p.matcher("aaabbced a ccdeaa");
String s = m.replaceAll("");

结果为"bbced ccde"

String.replaceAll 和 String.replaceFirst 是可执行正则表达式替换（删除）的简易做法。但String.replace不是按正则表达式来进行的。

JavaDoc class String 写道

String replace(char oldChar, char newChar)

ReturnsanewstringresultingfromreplacingalloccurrencesofoldCharinthisstringwithnewChar.

Stringreplace(CharSequencetarget,CharSequencereplacement)

Replaceseachsubstringofthisstringthatmatchestheliteraltargetsequencewiththespecifiedliteralreplacementsequence.

StringreplaceAll(Stringregex,Stringreplacement)

Replaceseachsubstringofthisstringthatmatchesthegivenregularexpressionwiththegivenreplacement.

StringreplaceFirst(Stringregex,Stringreplacement)

Replaces the first substring of this string that matches the given regular expression with the given replacement.

Java中常用的正则表达式元字符

. 代表任意字符

?表示前面的字符出现0次或1次

+表示前面的字符出现1次或多次

*表示前面的字符出现0次或多次

{n}表示前面的字符出现正好n次

{n,}表示前面的字符出现n次或以上

{n,m}表示前面的字符出现n次到m次

\d等于[0-9]数字

\D等于[^0-9]非数字

\s等于[\t\n\x0B\f]空白字元

\S等于[^\t\n\x0B\f]非空白字元

\w等于[a-zA-Z_0-9]数字或是英文字

\W等于[^a-zA-Z_0-9]非数字与英文字

^ 表示每行的开头

$ 表示每行的结尾

In Bash

Bash对正则表达式的支持

Bash v3 内置对正则表达式匹配的支持，操作符为 =~。（Bash Version 3）

[[ "$STR" =~ "$REGEX" ]]

man bash 写道

[[ expression ]]

Anadditionalbinaryoperator,=~,isavailable,withthesameprecedenceas==and!=.Whenitis

used,thestringtotherightoftheoperatorisconsideredanextendedregularexpressionandmatched

accordingly(asinregex(3)).Thereturnvalueis0ifthestringmatchesthepattern,and1otherwise.

Iftheregularexpressionissyntacticallyincorrect,theconditionalexpression’sreturnvalueis2.

Iftheshelloptionnocasematchisenabled,thematchisperformedwithoutregardtothecaseofalpha-

beticcharacters.Substringsmatchedbyparenthesizedsubexpressionswithintheregularexpressionare

savedinthearrayvariableBASH_REMATCH.TheelementofBASH_REMATCHwithindex0istheportionof

thestringmatchingtheentireregularexpression.TheelementofBASH_REMATCHwithindexnisthepor-

tion of the string matching the nth parenthesized subexpression.

在Bash中二元操作符 =~ 进行扩展的正则表达式匹配。如果匹配，返回值为0，否则1，如果正则表达式错误，返回2。如果shell选项nocasematch没有开启，那么匹配时区分大小写。在正则表达式中小括号包围的子表达式的匹配结果保存在BASH_REMATCH中，它是个数组，${BASH_REMATCH[0]}是匹配的整个字符串，${BASH_REMATCH[1]}是匹配的第一个子表达式的字符串，其他以此类推。

以下脚本来自 http://www.linuxjournal.com/content/bash-regular-expressions 很好的展示了Bash3.0中内置的正则表达式匹配功能。

#!/bin/bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

[root@jfht ~]# ./bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc

regex:aa(b{2,3}[xyz])cc

aabbxccmatches

capture[1]:bbx

aabbccdoesnotmatch

[root@jfht ~]#

在grep/egrep命令中进行正则表达式匹配

使用Basic RE

格式1：echo "$STR" | grep -q "$REGEX"

格式2：grep -q "$REGEX" <<<"$STR"

使用Extended RE

格式3：echo "$STR" | egrep -q "$REGEX"

格式4：egrep -q "$REGEX" <<<"$STR"

注意：grep/egrep加上-q参数是为了减少输出，根据退出码判断是否匹配，退出码为0时表示匹配。

man grep 写道

Egrep is the same as grep -E.

-E,--extended-regexp

InterpretPATTERNasanextendedregularexpression(seebelow).

-ePATTERN,--regexp=PATTERN

UsePATTERNasthepattern;usefultoprotectpatternsbeginningwith-.

-q,--quiet,--silent

Quiet;donotwriteanythingtostandardoutput.Exitimmediatelywithzerostatusifanymatchis

found, even if an error was detected. Also see the -s or --no-messages option.

匹配手机号码，模式为：1[3458][0-9]{9} 或 1[3458][0-9]\{9\}

[root@jfht ~]# echo "13012345678" | egrep '1[3458][0-9]{9}'

13012345678

[root@jfht~]#echo"13012345678"|grep'1[3458][0-9]{9}'

[root@jfht~]#echo"13012345678"|grep'1[3458][0-9]\{9\}'

13012345678

[root@jfht ~]#

STR="13024184301"
REGEX="1[3458][0-9]{9}"
if echo "$STR" | egrep -q "$REGEX"; then
    echo "matched"
else
    echo "not matched"
fi

[root@jfht ~]# STR="13024184301"

[root@jfht~]#REGEX="1[3458][0-9]{9}"

[root@jfht~]#ifecho"$STR"|egrep-q"$REGEX";then

>echo"matched"

>else

>echo"notmatched"

>fi

matched

[root@jfht ~]#

使用expr match进行正则表达式匹配

expr match "$STR" "$REGEX"

expr "$STR" : "$REGEX"

打印与正则表达式匹配的长度。

man expr 写道

STRING : REGEXP

anchoredpatternmatchofREGEXPinSTRING

matchSTRINGREGEXP

same as STRING : REGEXP

[root@jfht ~]# STR=Hello

[root@jfht~]#REGEX=He

[root@jfht~]#expr"$STR":"$REGEX"

[root@jfht ~]# REGEX=".*[aeiou]"

[root@jfht~]#expr"$STR":"$REGEX"

注意：贪婪匹配！

[root@jfht ~]# REGEX=ll

[root@jfht~]#expr"$STR":"$REGEX"

另外，expr match 也可以实现根据正则表达式取子串。

expr match "$STR" ".*$$SUB$.*"

expr "$STR" : ".*$$SUB$.*"

注意与上面不同的是，结果是子串，而不是匹配的长度。

[root@jfht ~]# STR="某某是2009年进公司的"

想从此字符串中提取出数字来，下面是尝试的过程。

[root@jfht~]#SUB="[0-9]+"

[root@jfht~]#expr"$STR":".*$$SUB$.*"

[root@jfht~]#SUB="[0-9]\+"

[root@jfht~]#expr"$STR":".*$$SUB$.*"

[root@jfht~]#SUB="[0-9]*"

[root@jfht~]#expr"$STR":".*$$SUB$.*"

[root@jfht~]#SUB="[0-9]\*"

[root@jfht ~]# expr "$STR" : ".*$$SUB$.*"

上面的写法都无法做到提取完整的年份，因为在正则匹配的时候是贪婪匹配，前面.*已经把能匹配的全部吃掉了。

[root@jfht~]#expr"$STR":"[^0-9]*$[0-9]\+$.*"

2009

网上问题：形如"someletters_12345_moreleters.ext"的文件名，以一些字母开头、跟上下划线、跟上5个数字、再跟上下划线、以一些字母及扩展名结尾。现在需要将数字提取出来，保存到一个变量中。

[root@jfht ~]# echo someletters_12345_moreleters.ext | cut -d'_' -f 212345

[root@jfht ~]# expr match 'someletters_12345_moreleters.ext' '.\+_$.\+$_.*'12345

[root@jfht ~]# FILE=someletters_12345_moreleters.ext

[root@jfht~]#NUM=$(exprmatch"$FILE"'.\+_$.\+$_.*')

[root@jfht~]#echo$NUM

12345

返回目录：Java程序员的Bash实用指南系列之字符串处理（目录）

上节内容：Bash字符串处理（与Java对照） - 20.查找子串的位置

下节内容：Bash字符串处理（与Java对照） - 22.判断字符串是否数字串

string 正则表达式 bash

安科网

Bash字符串处理（与Java对照） - 21.字符串正则匹配

ligaoyuan00

Bash字符串处理（与Java对照） - 21.字符串正则匹配

In Java

正则表达式查询

正则表达式提取

正则表达式分割

正则表达式替换（删除）

Java中常用的正则表达式元字符

In Bash

Bash对正则表达式的支持

在grep/egrep命令中进行正则表达式匹配

使用expr match进行正则表达式匹配

ligaoyuan00

相关推荐

正则表达式在NLP中应用

golang的序列化与反序列化的几种方式

Redis中的String类型及使用Redis解决订单秒杀超卖问题

springboot +redis 实现点赞、浏览、收藏、评论等数量的增减操作

Ajax实现登录案例

php使用event扩展的io复用测试的示例

Golang和Rust语言常见功能/库

好用到哭！请记住这20段Python代码

[Typescript] Function Overloads

JDBC连接MySQL

Golang面试make和new的用法

Redis migrate数据迁移工具的使用教程

关于 JavaScript 错误处理的最完整指南(下半部)

基于thinkphp5框架实现微信小程序支付退款订单查询退款查询操作

Golang 如何解析和生成json

PHP执行普通shell命令流程解析

php判断IP地址是否在多个IP段内

Python初学者必学的20个重要技巧

源码分析C++的string的实现

想要在JS中把正则玩得飘逸，学会这几个函数的使用必不可少

ligaoyuan00