strip non-ASCII characters from a string using RegEx (Regular expressions)

In case you have a string data which has some non ASCII characters and want to strip off all those non-ASCII characters the following regular expression will help you.



  • [^u0000-u007F]+ match a single character not present in the list below
    Quantifier: + Between one and unlimited times, as many times as possible
  • u0000-u007F a single character in the range between the following two characters
    • u0000 the literal character u0000 (case sensitive)
    • u007F the literal character u007F (case sensitive)

^ is the not operator. It tells the regex to find everything that doesn’t match, instead of everything that does match.

The u####-u#### says which characters match.u0000-u007F is the equivilent of the first 255 characters in utf-8 or unicode, which are always the ASCII characters. So you match every non ASCII character (because of the not)

RECOMMENDED READ  easyavr.h Header file for pin/port operations

I had a string like the one below where there are many non standard chars

name 1= Chanel 51������������������������������������������������������������

Applying the replace all method in java as below

 String s=   "name 1= Chanel 51������������������������������������������������������������"
s = s.replaceAll("[^u0000-u007F]+","");

would output the following to console

name 1= Chanel 51

Test it here

ASCII Table for reference.

Ascii Table

Extended ASCII characters


EBCDIC and IBM Scan Codes


Dost Muhammad Shah

Dost Muhammad specializing in Embedded Design, Firmware development, PCB designing , testing and prototyping. He enjoys sharing his experience with others .Get in touch with Dost on Twitter or via Contact form


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.