Python正则表达式子模式扩展语法与应用

时间：12-25来源：作者：点击数：25

正则表达式语法实际上是独立于任何语言的，在大多数编程语言都可以使用相同的语法。常见正则表达式语法请参考Python使用正则表达式处理字符串

正则表达式使用圆括号“()”表示一个子模式，圆括号内的内容作为一个整体对待，例如'(red)+'可以匹配'redred'、'redredred'等一个或多个重复'red'的情况。使用子模式扩展语法可以实现更加复杂的字符串处理功能。

常用子模式扩展语法

语法	功能说明
(?P<groupname>)	为子模式命名
(?iLmsux)	设置匹配标志，可以是几个字母的组合，每个字母含义与编译标志相同
(?:...)	匹配但不捕获该匹配的子表达式
(?P=groupname)	表示在此之前的命名为groupname的子模式
(?#...)	表示注释
(?<=…)	用于正则表达式之前，如果<=后的内容在字符串中不出现则匹配，但不返回<=之后的内容
(?=…)	用于正则表达式之后，如果=后的内容在字符串中出现则匹配，但不返回=之后的内容
(?<!...)	用于正则表达式之前，如果<!后的内容在字符串中不出现则匹配，但不返回<!之后的内容
(?!...)	用于正则表达式之后，如果!后的内容在字符串中不出现则匹配，但不返回!之后的内容

>>> import re

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")

>>> m.group('first_name') #使用命名的子模式

'Malcolm'

>>> m.group('last_name')

'Reynolds'

>>> m = re.match(r"(\d+)\.(\d+)", "24.1632")

>>> m.groups() #返回所有匹配的子模式（不包括第0个）

('24', '1632')

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")

>>> m.groupdict() #以字典形式返回匹配的结果

{'first_name': 'Malcolm', 'last_name': 'Reynolds'}

>>> exampleString = '''There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than right now.'''

>>> pattern = re.compile(r'(?<=\w\s)never(?=\s\w)') #查找不在句子开头和结尾的never

>>> matchResult = pattern.search(exampleString)

>>> matchResult.span()

(172, 177)

>>> pattern = re.compile(r'(?<=\w\s)never') #查找位于句子末尾的单词

>>> matchResult = pattern.search(exampleString)

>>> matchResult.span()

(156, 161)

>>> pattern = re.compile(r'(?:is\s)better(\sthan)') #查找前面是is的better than组合

>>> matchResult = pattern.search(exampleString)

>>> matchResult.span()

(141, 155)

>>> matchResult.group(0) #组0表示整个模式

'is better than'

>>> matchResult.group(1)

' than'

>>> pattern = re.compile(r'\b(?i)n\w+\b') #查找以n或N字母开头的所有单词

>>> index = 0

>>> while True:

matchResult = pattern.search(exampleString, index)

if not matchResult:

break

print(matchResult.group(0), ':', matchResult.span(0))

index = matchResult.end(0)

not : (92, 95)

Now : (137, 140)

never : (156, 161)

never : (172, 177)

now : (205, 208)

>>> pattern = re.compile(r'(?<!not\s)be\b') #查找前面没有单词not的单词be

>>> index = 0

>>> while True:

matchResult = pattern.search(exampleString, index)

if not matchResult:

break

print(matchResult.group(0), ':', matchResult.span(0))

index = matchResult.end(0)

be : (13, 15)

>>> exampleString[13:20] #验证一下结果是否正确

'be one-'

>>> pattern = re.compile(r'(\b\w*(?P<f>\w+)(?P=f)\w*\b)') #匹配有连续相同字母的单词

>>> index = 0

>>> while True:

matchResult = pattern.search(exampleString, index)

if not matchResult:

break

print(matchResult.group(0), ':', matchResult.group(2))

index = matchResult.end(0) + 1

unless : s

better : t

>>> s = 'aabc abcd abbcd abccd abcdd'

>>> p = re.compile(r'(\b\w*(?P<f>\w+)(?P=f)\w*\b)')

>>> p.findall(s)

[('aabc', 'a'), ('abbcd', 'b'), ('abccd', 'c'), ('abcdd', 'd')]

>>> s = "It's a very good good idea"

>>> re.sub(r'(\b\w+) \1', r'\1', s) #处理连续的重复单词

"It's a very good idea"

>>> re.sub(r'((\w+) )\1', r'\2', s)

"It's a very goodidea"

方便获取更多学习、工作、生活信息请关注本站微信公众号 城东书院微信服务号

来顶一下

返回首页

上一篇:Python标准库base64用法简介下一篇:Python+flask+flask-email发送带附件的电子邮件

高考生入学注意：这些大	【健康】纯净水、天然
14种竞赛生升学路径盘	excel后缀xls和xlsx有

首页

学习

工作

生活

兴趣组

电子

计算机

掌上机件

图库

游戏

考试与竞赛

黑板报

国学

外语

下载

故事汇

社区

课程

Python正则表达式子模式扩展语法与应用