Python: Speed For "in" Vs Regular Expression
When determining whether an instance of substring exists in a larger string, I am considering two options: (1) if 'aaaa' in 'bbbaaaaaabbb': dosomething() (2) pattern = re.com
Solution 1:
Regex will be slower.
$ python -m timeit '"aaaa"in"bbbaaaaaabbb"'
10000000 loops, best of 3: 0.0767 usec per loop
$ python -m timeit -s 'import re; pattern = re.compile("aaaa")' 'pattern.search("bbbaaaaaabbb")'
1000000 loops, best of 3: 0.356 usec per loop
Solution 2:
Option (1) definitely is faster. For the future, do something like this to test it:
>>>import time, re>>>ifTrue:... s = time.time()..."aaaa"in"bbbaaaaaabbb"...print time.time()-s...
True
1.78813934326e-05
>>>ifTrue:... s = time.time()... pattern = re.compile("aaaa")... pattern.search("bbbaaaaaabbb")...print time.time()-s...
<_sre.SRE_Match object at 0xb74a91e0>
0.0143280029297
gnibbler's way of doing this is better, I never really played around with interpreter options so I didn't know about that one.
Solution 3:
I happen to have the E.coli genome at hand, so I tested the two options... Looking for "AAAA" in the E.coli genome 10,000,000 times (just to have decent times) with option (1) takes about 3.7 seconds. With option (2), of course with pattern = re.compile("AAAA") out of the loop, it took about 8.4 seconds. "dosomething()" in my case was adding 1 to an arbitrary variable. The E. coli genome I used is 4639675 nucleotides (letters) long.
Post a Comment for "Python: Speed For "in" Vs Regular Expression"