2.5小节完成。

This commit is contained in:
XiongNeng
2014-09-03 18:03:56 +08:00
parent 7cbce58af9
commit 11ca56304a
2 changed files with 104 additions and 3 deletions

View File

@@ -0,0 +1,38 @@
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
"""
Topic: 字符串搜索和替换
Desc :
"""
import re
from calendar import month_abbr
def change_date(m):
mon_name = month_abbr[int(m.group(1))]
return '{} {} {}'.format(m.group(2), mon_name, m.group(3))
def search_replace():
text = 'yeah, but no, but yeah, but no, but yeah'
print(text.replace('yeah', 'yep'))
# 复杂的模式使用sub()
text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
print(re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text))
# 先编译
datepat = re.compile(r'(\d+)/(\d+)/(\d+)')
print(datepat.sub(r'\3-\1-\2', text))
# 更复杂的替换,使用回调函数
print(datepat.sub(change_date, text))
# 同时返回替换次数
newtext, n = datepat.subn(r'\3-\1-\2', text)
print(newtext, n)
if __name__ == '__main__':
search_replace()

View File

@@ -5,14 +5,77 @@
----------
问题
----------
todo...
你想在字符串中搜索和匹配指定的文本模式
|
----------
解决方案
----------
todo...
对于简单的字面模式直接使用str.repalce()方法即可,比如:
.. code-block:: python
>>> text = 'yeah, but no, but yeah, but no, but yeah'
>>> text.replace('yeah', 'yep')
'yep, but no, but yep, but no, but yep'
>>>
对于复杂的模式请使用re模块中的sub()函数。
为了说明这个,假设你想将形式为"11/27/201"的日期字符串改成"2012-11-27"。示例如下:
.. code-block:: python
>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> import re
>>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text)
'Today is 2012-11-27. PyCon starts 2013-3-13.'
>>>
sub()函数中的第一个参数是被匹配的模式,第二个参数是替换模式。反斜杠数字比如\3指向前面模式的捕获组号。
如果你打算用相同的模式做多次替换,考虑先编译它来提升性能。比如:
.. code-block:: python
>>> import re
>>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)')
>>> datepat.sub(r'\3-\1-\2', text)
'Today is 2012-11-27. PyCon starts 2013-3-13.'
>>>
对于更加复杂的替换,可以传递一个替换回调函数来代替,比如:
.. code-block:: python
>>> from calendar import month_abbr
>>> def change_date(m):
... mon_name = month_abbr[int(m.group(1))]
... return '{} {} {}'.format(m.group(2), mon_name, m.group(3))
...
>>> datepat.sub(change_date, text)
'Today is 27 Nov 2012. PyCon starts 13 Mar 2013.'
>>>
一个替换回调函数的参数是一个match对象也就是match()或者find()返回的对象。
使用.group()方法来提取特定的匹配部分。回调函数最后返回替换字符串。
如果除了替换后的结果外你还想知道有多少替换发生了可以使用re.subn()来代替。比如:
.. code-block:: python
>>> newtext, n = datepat.subn(r'\3-\1-\2', text)
>>> newtext
'Today is 2012-11-27. PyCon starts 2013-3-13.'
>>> n
2
>>>
|
----------
讨论
----------
todo...
There isnt much more to regular expression search and replace than the sub() method
shown. The trickiest part is specifying the regular expression pattern—something thats
best left as an exercise to the reader.