Skip to content Skip to sidebar Skip to footer

Regex Expression To Exclude Lines Based On Beginning Or Ending Patterns

I searching a file for lines that do not match one of three possible regex patterns in python. If I was to search each individually, the patterns are: pattern1 = '_[AB]_[0-9]+$' pa

Solution 1:

In essence, you are combining three smaller-regexes into one, saying that the matcher could match any of those in place of the other. The general method for this is the alternation operator, as @TallChuck has commented. So, in keeping with his example and your variables, I might do this:

pattern1 = '_[AB]_[0-9]+$'pattern2 = '^>uce.+'pattern3 = '^>ENSOFAS.+'re_pattern = '(?:{}|{}|{})'.format(pattern1, pattern2, pattern3)
your_re = re.compile( re_pattern )

There's no reason you cannot include the beginning-of-line anchor ^ in each subpattern, so I've done that. Meanwhile, your example used the grouping (non-capturing) operator which is `(?:...), so I've mimicked that here as well.

The above is the exact same as if you had put it together all at once:

your_re = re.compile('(?:_[AB]_[0-9]+$|^>uce.+|^>ENSOFAS.+)')

Take your pick as to which is more readable and maintainable by you or your team.

Finally, note that it may be more efficient to pull out the beginning of line anchor (^) as the last paragraph of your question suggested, or the regex engine may be smart enough to do that on its own. Suggest to get it working first, then optimize if you need to.

Another option is to match all three at the beginning of the line by simply adding the "match anything" operator (.*) to the first pattern:

^(?:.*_[AB]_[0-9]+$|>uce.+|>ENSOFAS.+)

Post a Comment for "Regex Expression To Exclude Lines Based On Beginning Or Ending Patterns"