본문 바로가기
정규표현식

[ python ] 정규표현식[4]

by fiasco 2022. 10. 30.
Mastering Python Regular Expressions

http://www.packtpub.com/


Authors : Félix López 

제4장  Look Around

 

더 강력한 종류의 zero-width assertion(캐릭터 소비와 매칭없이 input의 포지션이 알맞는지를 보증).

캐릭터 소비없이, 매칭의 positive or negative result를 반환.

 

 

 

Look ahead

Positive look ahead

import re
expr1 = r'fox'
expr2 = r'(?=fox)'
data = "The quick brown fox jumps over the lazy dog"
pattern = re.compile(expr1)
result = pattern.search(data)
print(result.start(), result.end())  # 16 19        # 소비됨으로 endpos 변경됨

pattern = re.compile(expr2)
result = pattern.search(data)
print(result.start(), result.end())# 16 16        # 비소비로 endpos변경 없읍
print(result)        # <re.Match object; span=(16, 16), match=''>  => 포지션 반환, content 반환x

import re
expr1 = r'\w+,'
expr2 = r'\w+(?=,)'
expr3 = r'\w+(?=\,|\.)'
data = "They were three: Felix, Victor, and Carlos."
pattern = re.compile(expr1)
result1 = pattern.findall(data)
print(result1)                       # ['Felix,', 'Victor,']

pattern = re.compile(expr2)
result2 = pattern.findall(data)
print(result2)                       # ['Felix', 'Victor']

pattern = re.compile(expr3)
result2 = pattern.findall(data)
print(result2)                       # ['Felix', 'Victor', 'Carlos']

Negative look ahead

import re
expr = r'John(?!\sSmith)'
data = "I would rather go out with John McLane than with John Smith or John Bon Jovi"
pattern = re.compile(expr)
result = pattern.finditer(data)
for i in result:
    print (i.start(), i.end())
'''
27 31    # John McLane의 John
63 67    # John Bon Jovi의 John
'''

Look around and substitutions

import re
expr = r'\d{1,3}'
expr2 = r'\d{1,3}(?=(\d{3})+(?!\d))'
data = "The number is: 1234567890"
pattern = re.compile(expr)
result = pattern.findall(data)
print(result)                       # ['123', '456', '789', '0']

pattern = re.compile(expr2)
results = pattern.finditer(data)
for result in results:
    print(result.group())
'''
1
234
567
'''
pattern = re.compile(expr2)
result = pattern.sub(r'\g<0>,', "1234567890")
print(result)                       # 1,234,567,890

Look behind

이 메커니즘은 fixed-width patterns만 지원, variable-width patterns(quantifier, back reference)을 위해 regex모듈 이용하라!!! ( Alternation는 동일 길이여야 한다)  - https://pypi.python.org/pypi/regex

Positive look behind

import re
expr = r'(?<=John\s)McLane'
data = "I would rather go out with John McLane than with John Smith or John Bon Jovi"
pattern = re.compile(expr)
result = pattern.finditer(data)
for i in result:
    print (i.start(), i.end())    # 32 38
    print(i)                      # <re.Match object; span=(32, 38), match='McLane'>

Negative look behind

import re
expr = r'(?<!John\s)Doe'
data = "John Doe, Calvin Doe, Hobbes Doe"
pattern = re.compile(expr)
result = pattern.finditer(data)
for i in result:
    print (i.start(), i.end())
    print(i)

'''
17 20
<re.Match object; span=(17, 20), match='Doe'>
29 32
<re.Match object; span=(29, 32), match='Doe'>
'''

# twitter scannig
import re
pattern = re.compile(r'(?<=\B@)[\w_]+')
result = pattern.findall("Know your Big Data = 5 for $50 on eBooks and 40% off all eBooks until Friday #bigdata #hadoop @HadoopNews packtpub.com/ bigdataoffers")
print(result) # ['HadoopNews']

Look around and groups

import re

pattern = re.compile(r'\w+\s[\d-]+\s[\d:,]+\s(.*(?<!authentication\s)failed)')
a=pattern.findall("INFO 2013-09-17 12:13:44,487 authentication failed")
print(a)      # []

b=pattern.findall("INFO 2013-09-17 12:13:44,487 something else failed")
print(b)      # ['something else failed']

'정규표현식' 카테고리의 다른 글

[ Python ] 정규표현식 Table 및 우선순위  (0) 2022.11.01
[ python ] 정규표현식[5]  (0) 2022.10.31
[ Python ] 정규표현식[3]  (0) 2022.10.27
[ python ] 정규표현식[2]  (0) 2022.10.27
[ python ] 정규표현식[1]  (0) 2022.10.27