How to prompt gpt so it does not make mistakes with time window

I'm trying to extract property condition from estate description. In particular, any property being renovated in 2020 or above should be tagged as "JUST_RENOVATED" whereas if the renovation took place before 2020, it should simply be tagged as "GOOD".

Here is an example :

Given the following description :
Entièrement rénovée en 2017, cette jolie maison 2 chambres vous séduira par ses pièces épurées et lumineuses. PEB exceptionnel (PEB A) grâce à la qualité d'isolation utilisée. Faible consommation de gaz pour le chauffage central. Châssis triple vitrage. Cuisine ouverte entièrement équipée. Installation électrique aux normes RGIE. Compteur bi-horaire. Pour plus de renseignements et pour participer aux prochaines visites, merci de contacter l'agence immobilière ASTON & PARTNERS au 081/30.44.44.

Property condition should be "GOOD".

However, GPT seems to have difficulties to understand time window. It will generally tag is as "JUST_RENOVATED" justifying it is in the renovation time window (despite 2017 being before 2020).

Here's the prompt I used, How can I improve it ?

Extract the property condition based on descriptions.

Follow this order of decision :
1. Tag any property that has been renovated recently (i.e. 2020 and above) by "JUST_RENOVATED". If renovation have been made before 2020, tag by "GOOD".
2. Tag any property that has been recently build or is a project by "AS_NEW".
3. Tag any property with need of restorations by "TO_RENOVATE".
4. Tag any property in good condition (i.e. good energetic performance) by "GOOD".
5. If none of the above tag suit the description, tag by "NOT_FOUND".

Answer only with the tag.

Eventually, the python code if that can help:

def debug_prompt(description):
    intro_message = f""" 
        Extract the property condition based on descriptions.

        Follow this order of decision :
        1. Tag any property that has been renovated recently (i.e. 2020 and above) by "JUST_RENOVATED". If renovation have been made before 2020, tag by "GOOD".
        2. Tag any property that has been recently build or is a project by "AS_NEW".
        3. Tag any property with need of restorations by "TO_RENOVATE".
        4. Tag any property in good condition (i.e. good energetic performance) by "GOOD".
        5. If none of the above tag suit the description, tag by "NOT_FOUND".

        Answer only with the tag.
    """

    system_message = [{"role": "system", "content": intro_message}]


    debug_prompt = [{
        "role": "user", 
        "content": f"""
            Extract the estate condition from the following description: '''{description}'''.
        """
    }]
    
    messages = system_message + debug_prompt

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0,
    )
    
    for response in response.choices:
        print(response.message.content.strip())

Solution

I have used your system and prompt messages with the API, and gpt-3.5-turbo responds with: GOOD, which is the desired result - so I am not too sure why you are not getting the same. I get the same from gpt-4-turbo, and this often gives better results than 3.5-turbo, so I would suggest that you try this.

Also the degree of attention allocated to System messages is sometimes questionable, and I would suggest that you also try to move the information from the System message to the main prompt. I find that this gives good results, and in the case of your data, both gpt-3.5-turbo and gpt-4-turbo again returned the desired result of GOOD.

As a check I also changed the date of renovation in the data to 2022 in the last tests, and both LLMs correctly returned: JUST_RENOVATED.

Note that (as with your code) all tests were carried out at a temperature of zero for reproducibility.