Search code examples
elasticsearchelasticsearch-painless

how do add a custom scoring script in painless


this language is not painless at all... zero examples and the docs are lacking...

I am trying to build a custom distance function between embeddings, I have done it in python:

def my_norm(x,y):
    norm_embeddings = [sum(a * a) ** 0.5 for a in x]
    norm_target = sum(y * y) ** 0.5
    z = y-x
    norm_top = [sum(a * a) ** 0.5 for a in z]
    return norm_top / (norm_embeddings+norm_target)

where x is a N*m array and y is m vector

here is what I got in painless, don't even know if this will work...

def normalized_euclidean_dist(def x, def y){
  def norm_embeddings= new ArrayList();
  def norm_target = Math.pow((y*y).sum(), 0.5);
  def z = y-x;
  def norm_top=new ArrayList();
  for (a in x){
    norm_embeddings.add(Math.pow((a*a).sum(), 0.5))
  }
  for (a in z){
    norm_top.add(Math.pow((a*a).sum(), 0.5))
  }
  return norm_top/(norm_embeddings+norm_target)
}

how do I call this fucntion on the script?


Solution

  • its hard to debug it without actual docs/index mapping, but this is the general way to call script sorting

    GET some_index/_search
    {
      "query": {
        "match_all": {}
      },
      "sort": {
        "_script": {
          "type": "number",
          "script": {
            "lang": "painless",
            "source": """
            float normalized_euclidean_dist(def x, def y){
              def norm_embeddings= new ArrayList();
              def norm_target = Math.pow((y*y).sum(), 0.5);
              def z = y-x;
              def norm_top=new ArrayList();
              for (a in x){
                norm_embeddings.add(Math.pow((a*a).sum(), 0.5));
              }
              for (a in z){
                norm_top.add(Math.pow((a*a).sum(), 0.5));
              }
              return norm_top/(norm_embeddings+norm_target);
            }
            
            return normalized_euclidean_dist(doc['x_field'], doc['y_field']);
            """
          },
          "order": "asc"
        }
      }
    }
    

    Replace 'x_field' and 'y_field' with the actual field names, as far as I understand both of them are arrays, but if not, you need to add doc['x_field'].value

    Now sorting scripts need to return a number as result, but I may miss something in your script but looks to me like its trying to divide an array buy a number / another array, which wouldn't work so there is work to do there as well

    script sorting docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html#script-based-sorting