Remove all special characters, punctuation and spaces from string

Posted on

Question :

Remove all special characters, punctuation and spaces from string

I need to remove all special characters, punctuation and spaces from a string so that I only have letters and numbers.

Answer #1:

This can be done without regex:

>>> string = "Special $#! characters   spaces 888323"
>>> ''.join(e for e in string if e.isalnum())
'Specialcharactersspaces888323'

You can use str.isalnum:

S.isalnum() -> bool

Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.

If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that’s the best way to go about it.

Answered By: user225312

Answer #2:

Here is a regex to match a string of characters that are not a letters or numbers:

[^A-Za-z0-9]+

Here is the Python command to do a regex substitution:

re.sub('[^A-Za-z0-9]+', '', mystring)
Answered By: Andy White

Answer #3:

Shorter way :

import re
cleanString = re.sub('W+','', string )

If you want spaces between words and numbers substitute ” with ‘ ‘

Answered By: tuxErrante

Answer #4:

After seeing this, I was interested in expanding on the provided answers by finding out which executes in the least amount of time, so I went through and checked some of the proposed answers with timeit against two of the example strings:

  • string1 = 'Special $#! characters spaces 888323'
  • string2 = 'how much for the maple syrup? $20.99? That s ricidulous!!!'

Example 1

'.join(e for e in string if e.isalnum())

  • string1 – Result: 10.7061979771
  • string2 – Result: 7.78372597694

Example 2

import re
re.sub('[^A-Za-z0-9]+', '', string)

  • string1 – Result: 7.10785102844
  • string2 – Result: 4.12814903259

Example 3

import re
re.sub('W+','', string)

  • string1 – Result: 3.11899876595
  • string2 – Result: 2.78014397621

The above results are a product of the lowest returned result from an average of: repeat(3, 2000000)

Example 3 can be 3x faster than Example 1.

Answered By: mbeacom

Answer #5:

#!/usr/bin/python
import re

strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
print strs
nstr = re.sub(r'[?|$|.|!]',r'',strs)
print nstr
nestr = re.sub(r'[^a-zA-Z0-9 ]',r'',nstr)
print nestr

you can add more special character and that will be replaced by ” means nothing i.e they will be removed.

Answered By: pkm

Answer #6:

Python 2.*

I think just filter(str.isalnum, string) works

In [20]: filter(str.isalnum, 'string with special chars like !,#$% etcs.')
Out[20]: 'stringwithspecialcharslikeetcs'

Python 3.*

In Python3, filter( ) function would return an itertable object (instead of string unlike in above). One has to join back to get a string from itertable:

''.join(filter(str.isalnum, string)) 

or to pass list in join use (not sure but can be fast a bit)

''.join([*filter(str.isalnum, string)])

note: unpacking in [*args] valid from Python >= 3.5

Answered By: Grijesh Chauhan

Answer #7:

Differently than everyone else did using regex, I would try to exclude every character that is not what I want, instead of enumerating explicitly what I don’t want.

For example, if I want only characters from ‘a to z’ (upper and lower case) and numbers, I would exclude everything else:

import re
s = re.sub(r"[^a-zA-Z0-9]","",s)

This means “substitute every character that is not a number, or a character in the range ‘a to z’ or ‘A to Z’ with an empty string”.

In fact, if you insert the special character ^ at the first place of your regex, you will get the negation.

Extra tip: if you also need to lowercase the result, you can make the regex even faster and easier, as long as you won’t find any uppercase now.

import re
s = re.sub(r"[^a-z0-9]","",s.lower())
Answered By: Andrea

Answer #8:

string.punctuation contains following characters:

‘!”#$%&'()*+,-./:;<=>?@[]^_`{|}~’

You can use translate and maketrans functions to map punctuations to empty values (replace)

import string

'This, is. A test!'.translate(str.maketrans('', '', string.punctuation))

Output:

'This is A test'
Answered By: Vlad Bezden

Leave a Reply

Your email address will not be published. Required fields are marked *