Search code examples
sqlpostgresqlpattern-matchingextractcve

Extract CVE from a text column


I need to extract a CVE from a text column. The format is CVE-yyyy-xxxxx.

The year(yyyy) is variable, and the x’s will change per CVE - can be 1 to 6 digits. Sometimes the CVE is enclosed in (), sometimes followed by :

The column is like:

the vulnerability name (CVE-2019-0215) 
the vulnerability name (CVE-2019-0290) extra words 
the vulnerability name CVE-2018-23314: blah blah

Expected Output in a new column:

CVE-2019-0215
CVE-2019-0290
CVE 2018-23314

Solution

  • It is useful to layout the problem through sample data, and the expected result (ps: for this please learn some stackoverflow format capability).

    sample data

    CREATE TABLE mytable(
       sourcecol VARCHAR(100) NOT NULL
    );
    INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name (CVE-2019-0215)');
    INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name (CVE-2019-0290) extra words');
    INSERT INTO mytable(sourcecol) VALUES ('the vulnerability name CVE-2018-23314: blah blah');
    

    query: using regex pattern in substring function

    select substring(sourcecol from '(CVE-[0-9]{1,6}-[0-9]{1,6})+')
    from mytable
    

    this regex "pattern" looks for the string starting with "CVE-" followed by 1 to 6 digits followed by "-" followed by 1 to 6 digits

    result

    +----------------+
    |   substring    |
    +----------------+
    | CVE-2019-0215  |
    | CVE-2019-0290  |
    | CVE-2018-23314 |
    +----------------+
    

    see this dbfiddle