yaobin.wen

Yaobin's Blog

View on GitHub
1 June 2022

The regular expression that matches strings of certain length that consist of at least one non-whitespace character

by yaobin.wen

The system I’m working on has an input field that we use exactly one regular expression to test if the given value is valid or not. The validity tests are:

For example, the following values are acceptable (note that \w refers to a whitespace which is either a blank space or a tab):

Value Acceptable? Notes
"0123" Yes Less than 10 characters.
"0123456789" Yes Exactly 10 characters.
"0123456789abc" No Longer than 10 characters.
"\w\w\w" No Only white spaces
"A\w\w\w\w\w\w" Yes A lot of white spaces but has at least an non-whitespace character A.
"\w\w\w\wA" Yes A lot of white spaces but has at least an non-whitespace character A.
"\w\w\wA\w\w\w" Yes A lot of white spaces but has at least an non-whitespace character A.

The first regular expression that bumped into my head was ^.*\S.*$ which means:

This regular expression can satisfy the validity tests 1) and 3). Unfortunately, it doesn’t limit the string length to 10 at most. As a result, the string 01234567890123456789 will also pass the validity test.

This answer that uses negative lookahead inspired and helped me figure out the solution to my problem: ^(?!\s*$).{1,10}$. You can try this regular expression to any one of the following online regular expression testing websites:

Using negative lookbehind ^(?<!\s*$).{1,10}$ may not work because the * quantifier inside a lookbehind makes it non-fixed width, but, as mentioned in Lookahead and Lookbehind Zero-Length Assertions, many regular expression implementations “including those used by Perl, Python, and Boost only allow fixed-length strings.”

Tags: Tech - Regex