最近有一个任务就是过滤掉文件中的非法不可读字符,也就是除了大小写字符,数字还有标点符号外其它的所有字符。网上找竟然没有现成的,就自己写了一个:
"[^a-zA-Z0-9 -/:-@\[-`{-~]"
符合这个条件的字符都会被替换为空,虽然可以更精简一点,不过我还是把标点符号那一部分孤立出来了,加强了扩展性,小伙伴们可以直接摘取,标点符号有几段是不连续的。
Recently a task was put on my desk which I needed to write a regex to filter out all non-readable characters from . In other words, only alphabetic, numeric and symbolic characters are allowed in that field. Well, I did not find anyone wrote this kinda of regex on the internet and came up my own one:
"[^a-zA-Z0-9 -/:-@\[-`{-~]"
All characters meets the above regex will be replaced with an empty string. Though it can be more simple, for compatibility, I just isolated the symbolic part from other two parts (alphabetic and numeric). The symbolic part is not continues and has several chunks coz those chunks are not continuous on the ASCII table. Please feel free to use it and leave your comment below.