Python正则表达式匹配多行(re.DOTALL)

lvel 发布于 2019-03-09 multilinestring 最后更新 2019-03-09 14:32 8 浏览

我试图用多行解析一个字符串。 假设它是:

text = '''
Section1
stuff belonging to section1
stuff belonging to section1
stuff belonging to section1
Section2
stuff belonging to section2
stuff belonging to section2
stuff belonging to section2
'''
我想使用re模块的finditer方法来获得一个字典,如:
{'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to section1\nstuff belonging to section1\n'}
{'section': 'Section2', 'section_data': 'stuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2\n'}
我尝试了以下内容:
import re
re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+)", re.DOTALL)
sections_it = re_sections.finditer(text)
for m in sections_it:
    print m.groupdict() 
但是这导致:
{'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to    section1\nstuff belonging to section1\nSection2\nstuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2\n'}
所以section_data也匹配Section2。 我还试图告诉第二组除了第一组之外都匹配。但是,这导致根本没有输出。
re_sections=re.compile(r"(?P<section>Section\d)\s+(?P<section_data>^(?P=section))", re.DOTALL)
我知道我可以使用下面的re,但是我正在寻找一个版本,我不必告诉第二组是什么样的。
re_sections=re.compile(r"(?P<section>Section\d)\s+(?P<section_data>[a-z12\s]+)", re.DOTALL)
非常感谢你!
已邀请:

tet

赞同来自:

使用前瞻来匹配下一节标题或字符串末尾的所有内容:

re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+?)(?=(?:Section\d|$))", re.DOTALL)
请注意,这也需要非贪婪的.+?,否则它仍然会一直匹配到最后。 演示:
>>> re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+?)(?=(?:Section\d|$))", re.DOTALL)
>>> for m in re_sections.finditer(text): print m.groupdict()
... 
{'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to section1\nstuff belonging to section1\n'}
{'section': 'Section2', 'section_data': 'stuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2'}