Search code examples
elasticsearch

Search email addresses in ElasticSearch by regexp


How can I search in Elastic by regexp to find email addresses?

I try like that:

{
  "query": {
    "regexp": {
      "data": {
        "value": "[\\w\\'\\.\\_\\+\\-]+@[\\w]+[\\w\\-\\.]*\\.[\\w]+",
        "flags": "ALL"
      }
    }
  }
}

But I get nothing.

There https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-regexp-query.html written that simbol @ means any symbol and therefore it should to be escaped.

Therefore I try like that:

{
      "query": {
        "regexp": {
          "data": {
            "value": "\\@",
            "flags": "ALL"
          }
        }
      }
    }

and like that:

{
          "query": {
            "regexp": {
              "data": {
                "value": "\\@",
                "flags": "ALL"
              }
            }
          }
        }

but I get nothing again.

Any thoughts?

UPDATE I use Elastic Search of 5.2 version on Ubuntu Ubuntu 16.04.2 LTS.

Data samples:

curl -XPOST 'localhost:9200/my_index/index_type/_bulk?pretty' -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"data":"some text some text some text some text admin@company.comsome text some text some text "}
{"index":{"_id":"2"}}
{"data":"some text some text hr@company.comsome text some text "}
{"index":{"_id":"3"}}
{"data":"some text some text webmaster@company.comsome text some text "}

and searching query:

curl -XGET 'localhost:9200/my_index/index_type/_search?pretty' -H 'Content-Type: application/json' -d'
{
   "query": {
       "regexp": {
          "data": {
            "value": "\\@",
            "flags": "ALL"
          }
       }
   }
}
'

Solution

  • Use "[a-zA-Z]+@[a-zA-Z]+.[a-zA-Z]+" regexp to find email addresses. Then run the following query find all email address

    GET /company/employee/_search
    {
       "query": {
           "regexp": {
              "data": {
                "value": "[a-zA-Z]+@[a-zA-Z]+.[a-zA-Z]+",
                "flags": "ALL"
              }
           }
       }
    }
    

    Here company as index, employee as type and data as field name, then insert the following data to check your query works:

    POST /company/employee/_bulk
    {"index":{"_id":"1"}}
    {"data":"admin@company.com"}
    {"index":{"_id":"2"}}
    {"data":"hr@company.com"}
    {"index":{"_id":"3"}}
    {"data":"webmaster@company.com"}
    {"index":{"_id":"4"}}
    {"data":"abc@company.com"}
    {"index":{"_id":"5"}}
    {"data":"ahmed@company.com"}
    {"index":{"_id":"6"}}
    {"data":"md@company.com"}
    {"index":{"_id":"7"}}
    {"data":"boss@company.com"}
    {"index":{"_id":"8"}}
    {"data":"amd@company.com"}
    {"index":{"_id":"9"}}
    {"data":"ad@company.com"}
    {"index":{"_id":"10"}}
    {"data":"ed@company.com"}
    {"index":{"_id":"11"}}
    {"data":"etc@company.com"}
    {"index":{"_id":"12"}} 
    {"data":"f23f23f23f23f23 d32d23d32d d32d2 3d 23"} 
    {"index":{"_id":"13"}} 
    {"data":"d23d32 d32d23d32 etc@company.com d3d23d23"}