Python: Speed For "in" Vs Regular Expression

August 13, 2023 Post a Comment

When determining whether an instance of substring exists in a larger string, I am considering two options: (1) if 'aaaa' in 'bbbaaaaaabbb': dosomething() (2) pattern = re.com

Solution 1:

Regex will be slower.

$ python -m timeit '"aaaa"in"bbbaaaaaabbb"'
10000000 loops, best of 3: 0.0767 usec per loop
$ python -m timeit -s 'import re; pattern = re.compile("aaaa")' 'pattern.search("bbbaaaaaabbb")'
1000000 loops, best of 3: 0.356 usec per loop

Solution 2:

Option (1) definitely is faster. For the future, do something like this to test it:

>>>import time, re>>>ifTrue:...    s = time.time()..."aaaa"in"bbbaaaaaabbb"...print time.time()-s... 
True
1.78813934326e-05

>>>ifTrue:...    s = time.time()...    pattern = re.compile("aaaa")...    pattern.search("bbbaaaaaabbb")...print time.time()-s... 
<_sre.SRE_Match object at 0xb74a91e0>
0.0143280029297

gnibbler's way of doing this is better, I never really played around with interpreter options so I didn't know about that one.

Solution 3:

I happen to have the E.coli genome at hand, so I tested the two options... Looking for "AAAA" in the E.coli genome 10,000,000 times (just to have decent times) with option (1) takes about 3.7 seconds. With option (2), of course with pattern = re.compile("AAAA") out of the loop, it took about 8.4 seconds. "dosomething()" in my case was adding 1 to an arbitrary variable. The E. coli genome I used is 4639675 nucleotides (letters) long.

Python Tutorial for Beginners

Python: Speed For "in" Vs Regular Expression

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Python: Speed For "in" Vs Regular Expression"