Removing Symbols Between Letters Python

March 07, 2024 Post a Comment

I would like to remove certain symbols from a string. I only want to remove symbols that are between letters. If my question wasn't clear enough then here are some examples: symbol

Solution 1:

Regular expressions are definitely scary at first, but it's worth trying to learn them, as they end up being very useful. What you want in this case is:

import re
string = re.sub(r'([a-zA-Z])[@31!]+(?=[a-zA-Z])', r'\1', string)

Let's look at what this does.

re.sub is similar to str.replace, but it uses regular expressions.

[a-zA-Z] matches any letter.

[@31!]+ matches one or more of the listed symbols.

+ causes the resulting RE to match 1 or more repetitions of the preceding RE.

(?=[a-zA-Z]) is a lookahead assertion for a letter. This means that the match is followed by a letter, but the letter is not part of the match.

(?=...) matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.

So ([a-zA-Z])[@31!]+(?=[a-zA-Z]) matches a letter followed by one or more symbols from the list. This match is followed by a letter, but the match does not include the letter.

\1 is a back-reference to the parenthesized group in the regular expression, in this case [a-zA-Z]. That's what we want to replace what we found with.

(the rs before the strings are to make them raw strings, which often helps when using regular expressions.)

Edit:

As @ctwheels pointed out, you can also use a lookbehind assertion rather than a backreference:

string = re.sub(r'(?<=[a-zA-Z])[@31!]+(?=[a-zA-Z])', r'', string)

Solution 2:

This is tricky to do correctly. Although I generally prefer to avoid regex unless they're necessary, this is definitely a case where they make the job a lot easier. But anyway, here's a non-regex solution.

We use the standard groupby function to break the input string up into three kinds of groups: 'A' groups contain letters, 'S' groups contain the special symbols, 'O' groups contain anything else. Then we scan over the groups, copying them to the result list unless the group is an 'S' group and it has an 'A' group immediately before it and immediately following it. Finally, we join the copied groups back into a single string.

In order to make it easier to check the following group, we add a "fake" group of ('O', '') to the end of the list of groups. That way every real group has a following group.

from itertools import groupby

symbols = '@31!'defkeyfunc(c):
    if c in symbols:
        return'S'elif c.isalpha():
        return'A'else:
        return'O'defremove_symbols(s):
    groups = [(k, ''.join(g)) for k, g in groupby(s, keyfunc)] + [('O', '')]
    result = []
    prev = 'O'for i, (k, g) inenumerate(groups[:-1]):
        # If a group of symbols has an alpha group on both sides, don't copy itifnot (k == 'S'and prev == 'A'and groups[i+1][0] == 'A'):
            result.append(g)
        prev = k
    return''.join(result)

# Test

data = '''\
@@He11o Wor1d!
!!T3ach !m3
@13!
lala@@@@ 
'''

expected = '''\
@@Heo Word!
!!Tach !m3
@13!
lala@@@@
'''print('Data')
print(data)

print('Expected')
print(expected)

print('Output')
for s in data.splitlines():
    print(remove_symbols(s))

output

Data
@@He11o Wor1d!
!!T3ach !m3
@13!
lala@@@@ 

Expected
@@Heo Word!
!!Tach !m3
@13!
lala@@@@

Output
@@Heo Word!
!!Tach !m3
@13!
lala@@@@

Solution 3:

Code

See this regex in use here

(?<=[a-z])[@13!]+(?=[a-z])

Results

Input

@@He11o Wor1d!
!!T3ach !m3
@13!

Output

@@Heo Word!
!!Tach !m3
@13!

Explanation

(?<=[a-z]) Positive lookbehind ensuring what precedes is a letter between a and z
[@13!]+ Match one or more characters present in the set @13!
(?=[a-z]) Positive lookahead ensuring what follows is a letter between a and z

Using i flag makes the pattern case-insensitive, thus a-z also matches A-Z

Usage

import re
regex = r"(?<=[a-z])[@13!]+(?=[a-z])"
result = re.sub(regex, "", string, 0, re.IGNORECASE)
# re.IGNORECASE can be replaced with the shortened re.I

or (flag in the regex as opposed to passed to the function)

import re
regex = r"(?i)(?<=[a-z])[@13!]+(?=[a-z])"
result = re.sub(regex, "", string)

Python Tutorial for Beginners