Removing Symbols Between Letters Python
Solution 1:
Regular expressions are definitely scary at first, but it's worth trying to learn them, as they end up being very useful. What you want in this case is:
import re
string = re.sub(r'([a-zA-Z])[@31!]+(?=[a-zA-Z])', r'\1', string)
re.sub
is similar to str.replace
, but it uses regular expressions.
[a-zA-Z]
matches any letter.
[@31!]+
matches one or more of the listed symbols.
+ causes the resulting RE to match 1 or more repetitions of the preceding RE.
(?=[a-zA-Z])
is a lookahead assertion for a letter. This means that the match is followed by a letter, but the letter is not part of the match.
(?=...) matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.
So ([a-zA-Z])[@31!]+(?=[a-zA-Z])
matches a letter followed by one or more symbols from the list. This match is followed by a letter, but the match does not include the letter.
\1
is a back-reference to the parenthesized group in the regular expression, in this case [a-zA-Z]
. That's what we want to replace what we found with.
(the r
s before the strings are to make them raw strings, which often helps when using regular expressions.)
Edit:
As @ctwheels pointed out, you can also use a lookbehind assertion rather than a backreference:
string = re.sub(r'(?<=[a-zA-Z])[@31!]+(?=[a-zA-Z])', r'', string)
Solution 2:
This is tricky to do correctly. Although I generally prefer to avoid regex unless they're necessary, this is definitely a case where they make the job a lot easier. But anyway, here's a non-regex solution.
We use the standard groupby
function to break the input string up into three kinds of groups: 'A' groups contain letters, 'S' groups contain the special symbols, 'O' groups contain anything else. Then we scan over the groups, copying them to the result
list unless the group is an 'S' group and it has an 'A' group immediately before it and immediately following it. Finally, we join the copied groups back into a single string.
In order to make it easier to check the following group, we add a "fake" group of ('O', '')
to the end of the list of groups. That way every real group has a following group.
from itertools import groupby
symbols = '@31!'defkeyfunc(c):
if c in symbols:
return'S'elif c.isalpha():
return'A'else:
return'O'defremove_symbols(s):
groups = [(k, ''.join(g)) for k, g in groupby(s, keyfunc)] + [('O', '')]
result = []
prev = 'O'for i, (k, g) inenumerate(groups[:-1]):
# If a group of symbols has an alpha group on both sides, don't copy itifnot (k == 'S'and prev == 'A'and groups[i+1][0] == 'A'):
result.append(g)
prev = k
return''.join(result)
# Test
data = '''\
@@He11o Wor1d!
!!T3ach !m3
@13!
lala@@@@
'''
expected = '''\
@@Heo Word!
!!Tach !m3
@13!
lala@@@@
'''print('Data')
print(data)
print('Expected')
print(expected)
print('Output')
for s in data.splitlines():
print(remove_symbols(s))
output
Data
@@He11o Wor1d!
!!T3ach !m3
@13!
lala@@@@
Expected
@@Heo Word!
!!Tach !m3
@13!
lala@@@@
Output
@@Heo Word!
!!Tach !m3
@13!
lala@@@@
Solution 3:
Code
(?<=[a-z])[@13!]+(?=[a-z])
Results
Input
@@He11o Wor1d!
!!T3ach !m3
@13!
Output
@@Heo Word!
!!Tach !m3
@13!
Explanation
(?<=[a-z])
Positive lookbehind ensuring what precedes is a letter betweena
andz
[@13!]+
Match one or more characters present in the set@13!
(?=[a-z])
Positive lookahead ensuring what follows is a letter betweena
andz
Using i
flag makes the pattern case-insensitive, thus a-z
also matches A-Z
Usage
import re
regex = r"(?<=[a-z])[@13!]+(?=[a-z])"
result = re.sub(regex, "", string, 0, re.IGNORECASE)
# re.IGNORECASE can be replaced with the shortened re.I
or (flag in the regex as opposed to passed to the function)
import re
regex = r"(?i)(?<=[a-z])[@13!]+(?=[a-z])"
result = re.sub(regex, "", string)
Post a Comment for "Removing Symbols Between Letters Python"