![]() |
Mastering Python Regular Expressions http://www.packtpub.com/ Authors : Félix López |
제5장 Performance of Regular Expressions
Benchmarking regular expressions with Python
import re
from time import time as now
import cProfile # 함수 처리 과정 및 속도 측정
# 실행 속도 촉정 (ms)
def test(f, *args, **kargs):
start = now()
f(*args, **kargs)
print("The function %s lasted: %.12fms" % (f.__name__, (now() - start)*10**3))
# alternation함수
def alternation(text):
pat = re.compile('spa(in|niard)')
pat.search(text)
if __name__ == "__main__":
test(alternation, "spain") # The function alternation lasted: 0.000000000000 ms
cProfile.run("alternation('spaniard')")
The RegexBuddy tool
RegexBuddy Home : https://www.regexbuddy.com/buynow.html
Regular-Expressions.info : https://www.regular-expressions.info/
무료 다운로드 : https://download.cnet.com/RegexBuddy/3000-2352_4-10309844.html
Understanding the Python regex engine
Backtracking 구현 가능
※ Backtracking : 필요시 뒤로 돌아가기 위해 마지막 성공 위치를 기억하는 기능
( alternation, quantifiers 적용 가능 )
Catastrophic backtracking 문제 :
경로, 대상string이 증가함에 따라 backtracking은 stack overflow 문제가 발생한다
특히, greedy character로 구성되어 마지막 character matching에 실패하는 regex
import re
from time import time as now
import cProfile # 함수 처리 과정 및 속도 측정
# 실행 속도 촉정 (ms)
def test(f, *args, **kargs):
start = now()
f(*args, **kargs)
print("The function %s lasted: %.12fms" % (f.__name__, (now() - start)*10**3))
def catastrophic(n): # catastrophic backtracking 발생 - text에 c 부재
print ("Testing with %d characters" %n)
pat = re.compile('(a+)+c')
text = "%s" %('a' * n)
pat.search(text)
def catastrophic1(n):
print ("Testing with %d characters" %n)
pat = re.compile('(a+)+(b+)+c')
text = 'a'*n
text += 'b'*n
text += 'c'
pat.search(text)
if __name__ == "__main__":
for n in range(20, 30):
test(catastrophic, n) # 46sec
test(catastrophic1, n) # 1e-12msec 이하
Optimization recommendations
1) Reuse compiled patterns
import re
from time import time as now
import cProfile # 함수 처리 과정 및 속도 측정
# 실행 속도 촉정 (ms)
def test(f, *args, **kargs):
start = now()
f(*args, **kargs)
print("The function %s lasted: %.12fms" % (f.__name__, (now() - start)*10**3))
def dontreuse():
pattern = re.compile(r'\bfoo\b')
pattern.match("foo bar")
def callonethousandtimes():
for _ in range(1000):
dontreuse()
if __name__ == "__main__":
test(callonethousandtimes) # 0.998735427856ms
import re
from time import time as now
import cProfile # 함수 처리 과정 및 속도 측정
# 실행 속도 촉정 (ms)
def test(f, *args, **kargs):
start = now()
f(*args, **kargs)
print("The function %s lasted: %.12fms" % (f.__name__, (now() - start)*10**3))
pattern = re.compile(r'\bfoo\b')
def reuse():
pattern.match("foo bar")
def callonethousandtimes():
for _ in range(1000):
reuse()
if __name__ == "__main__":
test(callonethousandtimes) # e-12ms 이하
2) Extract common parts in alternation
import re
from time import time as now
import cProfile # 함수 처리 과정 및 속도 측정
# 실행 속도 촉정 (ms)
def test(f, *args, **kargs):
start = now()
f(*args, **kargs)
print("The function %s lasted: %.12fms" % (f.__name__, (now() - start)*10**3))
pattern = re.compile(r'/(Hello\sWorld|Hello\sContinent|Hello\sCountry)')
def nonoptimized():
pattern.match("Hello\sCountry")
def callonethousandtimes():
for _ in range(1000):
nonoptimized()
if __name__ == "__main__":
test(callonethousandtimes)
import re
from time import time as now
import cProfile # 함수 처리 과정 및 속도 측정
# 실행 속도 촉정 (ms)
def test(f, *args, **kargs):
start = now()
f(*args, **kargs)
print("The function %s lasted: %.12fms" % (f.__name__, (now() - start)*10**3))
pattern = re.compile(r'/Hello\s(World|Continent|Country)')
def optimized():
pattern.match("Hello\sCountry")
def callonethousandtimes():
for _ in range(1000):
optimized()
if __name__ == "__main__":
test(callonethousandtimes)
3) Shortcut to alternation
앞쪽에 가능성이 더 높은 옵션을 배치하며, 나머지 요소에 대해 나타날 확률이 같으면 길이가 짧은 요소를 긴 요소보다 먼저 배치한다.
import re
from time import time as now
def test(f, *args, **kargs):
start = now()
f(*args, **kargs)
print ("The function %s lasted: %f" %(f.__name__, now() - start))
pattern = re.compile(r'(white|black|red|blue|green)')
def optimized():
pattern.match("white")
def callonethousandtimes():
for _ in range(1000):
optimized()
test(callonethousandtimes)
4) Use non-capturing groups when appropriate
5) Be specific : /\w{15}/
6) Don't be greedy
'정규표현식' 카테고리의 다른 글
[ Python ] 정규표현식 Table 및 우선순위 (0) | 2022.11.01 |
---|---|
[ python ] 정규표현식[4] (0) | 2022.10.30 |
[ Python ] 정규표현식[3] (0) | 2022.10.27 |
[ python ] 정규표현식[2] (0) | 2022.10.27 |
[ python ] 정규표현식[1] (0) | 2022.10.27 |