Search code examples
pythontransformer-modelopenai-apilanguage-model

How to get token or code embedding using Codex API?


For a given code snippet, how to get embedding using the Codex API?

import os
import openai
import config


openai.api_key = config.OPENAI_API_KEY

def runSomeCode():
    response = openai.Completion.create(
      engine="code-davinci-001",
      prompt="\"\"\"\n1. Get a reputable free news api\n2. Make a request to the api for the latest news stories\n\"\"\"",
      temperature=0,
      max_tokens=1500,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0)

    if 'choices' in response:
        x = response['choices']
        if len(x) > 0:
            return x[0]['text']
        else:
            return ''
    else:
        return ''



answer = runSomeCode()
print(answer)

But I want to figure out given a python code block like the following, can I get the embedding from codex?

Input:

import Random
a = random.randint(1,12)
b = random.randint(1,12)
for i in range(10):
    question = "What is "+a+" x "+b+"? "
    answer = input(question)
    if answer = a*b
        print (Well done!)
    else:
        print("No.")

Output:

  • Embedding of the input code

Solution

  • The function get_embedding will give us an embedding for an input text.

    Canonical code from OpenAI here: https://github.com/openai/openai-python/blob/main/examples/embeddings/Get_embeddings.ipynb

    import openai
    from tenacity import retry, wait_random_exponential, stop_after_attempt
    
    @retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
    def get_embedding(text: str, engine="text-similarity-davinci-001") -> List[float]:
    
        # replace newlines, which can negatively affect performance.
        text = text.replace("\n", " ")
    
        return openai.Embedding.create(input=[text], engine=engine)["data"][0]["embedding"]
    
    embedding = get_embedding("Sample query text goes here", engine="text-search-ada-query-001")
    print(len(embedding))