LLMs performance can also decline over time

According to a recent study conducted by Stanford and Berkeley universities, ChatGPT (GPT-3.5) and GPT-4 don't always improve with time; there are instances where their performance can actually decline. For example, in March 2023, GPT-4 excelled at identifying prime numbers with an impressive 97.6% accuracy, but in June 2023, it struggled with the same questions (2.4% accuracy). The study compares the performance of both models on a diverse range of tasks, like math problem-solving, answering sensitive questions, code generation, and visual reasoning.

The results highlight the importance of continuous monitoring of AI quality, as AI's performance can change substantially in a short amount of time!

You can read the paper here:

