Search code examples
javascriptregexfile-uploadfilenameschinese-locale

Javascript : Throw error when Chinese characters are in the file name


Hi I would like to throw an error when I see Chinese characters in a file name using Javascript. My code throws an "Expected Hexadecimal Digit" error. So far I have the following code:

if(document.f1.Attachment.value.match(""/[\u4e00-\u9fff]|[\u3400-\u4dbf]|[\u{20000}-\u{2a6df}]|[\u{2a700}-\u{2b73f}]|[\u{2b740}-\u{2b81f}]|[\u{2b820}-\u{2ceaf}]|[\uf900-\ufaff]|[\u3300-\u33ff]|[\ufe30-\ufe4f]|[\uf900-\ufaff]|[\u{2f800}-\u{2fa1f}]""))
{
alert('Attachment cannot contain Chinese characters');
}

I was reading up and understood this is because of \u but I do not understand how I could fix this.


Solution

  • First of all, you should remove "" from both ends of a regex literal. In JS, you need to use plain /.../ with no " or ' wrapping the construct for it to work properly, to be parsed as a regex.

    Next, your pattern contains \u{XXXXX} notation that is compliant with the ECMAScript 6+ standard, and requires u modifier to work in the compatible JS environment. So, in ES6, this is a valid solution:

    .match(/[\u4e00-\u9fff]|[\u3400-\u4dbf]|[\u{20000}-\u{2a6df}]|[\u{2a700}-\u{2b73f}]|[\u{2b740}-\u{2b81f}]|[\u{2b820}-\u{2ceaf}]|[\uf900-\ufaff]|[\u3300-\u33ff]|[\ufe30-\ufe4f]|[\uf900-\ufaff]|[\u{2f800}-\u{2fa1f}]/u)
    

    To make it work in ES5, older browsers, you need to transpile the regex:

    .match(/[\u4e00-\u9fff\u3400-\u4dbf\uf900-\ufaff\u3300-\u33ff\ufe30-\ufe4f\uf900-\ufaff]|(?:[\uD840-\uD868\uD86A-\uD872][\uDC00-\uDFFF]|\uD869[\uDC00-\uDEDF\uDF00-\uDFFF]|\uD873[\uDC00-\uDEAF]|\uD87E[\uDC00-\uDE1F])/)
    

    JS ES5 demo:

    if("中文".match(/[\u4e00-\u9fff\u3400-\u4dbf\uf900-\ufaff\u3300-\u33ff\ufe30-\ufe4f\uf900-\ufaff]|(?:[\uD840-\uD868\uD86A-\uD872][\uDC00-\uDFFF]|\uD869[\uDC00-\uDEDF\uDF00-\uDFFF]|\uD873[\uDC00-\uDEAF]|\uD87E[\uDC00-\uDE1F])/)) {
      console.log("ES5: Chinese detected!");
    }

    JS ES6 demo:

    if("中文".match(/[\u4e00-\u9fff]|[\u3400-\u4dbf]|[\u{20000}-\u{2a6df}]|[\u{2a700}-\u{2b73f}]|[\u{2b740}-\u{2b81f}]|[\u{2b820}-\u{2ceaf}]|[\uf900-\ufaff]|[\u3300-\u33ff]|[\ufe30-\ufe4f]|[\uf900-\ufaff]|[\u{2f800}-\u{2fa1f}]/u)) {
      console.log("ES6: Chinese detected!");
    }

    This last one gives Invalid range in character set error in IE since it does not support ES6.