Utrecht University developed performance review: “Structural evaluation of AI is needed”

Robot die leest of studeert. Foto: Andrea de Santis, via Unsplash

A project team led by Utrecht University examined how companies can monitor their AI applications. “Artificial intelligence is getting an increasingly important role in organisations, yet there is no structural monitoring of how AI performs its tasks,” project leader Iris Muis reveals. As a result, risks of profiling and discrimination, for example, are growing. “Our solution is a periodic ‘performance review’, just as is the case with human employees.”

Evaluating functioning of AI

“A ‘job interview’ for AI is already common”, Muis says. “There are many tests available to determine whether a particular AI would fit within a company.” But once an AI system is implemented, it is not monitored or evaluated, her research team found. It turned out to be a gap that exists in academic literature as well as in practice. “While the performance of AI systems should be evaluated periodically to check whether they are doing and continue to do what is intended.”

Muis and her team subsequently developed a performance review for AI. “With this set-up, we provide tools for market players and supervisors to evaluate the functioning of AI,” Muis explains.

Questionnaire for artificial intelligence

The review follows a similar structure to the assessment of human employees. It consists of four sections with questions attached, such as:

  1. Tasks. What kind of tasks does the AI have? Have these tasks changed over time? Has the AI itself changed, e.g. due to changes in the code?
  2. Performance. How does the AI perform? Has the AI made any mistakes? Is the performance in line with expectations?
  3. Organisation. Who is responsible for the functioning of the AI? Are those responsibilities clear? Has there been a performance review with the AI before? If so, how were any issues of concern followed up?
  4. Development. What opportunities are there to improve the AI, both in performance and usability? Have other AI technologies or methods become available that might work better than the current one?

Team of researchers and supervisors

“Although the research project is now concluded, we remain committed to setting up supervision of AI to minimise risks from its use,” Muis concludes.

The partnership was led by Utrecht University’s Data School, involving Iris Muis, Elise Renkema, Mirko Schaefer, Julia Straatman, Arthur Vankan and Daan van der Weijden. They collaborated with supervisors from the Dutch Authority of Digital Infrastructure and the Netherlands Food and Consumer Product Safety Authority. The project was funded by these institutions and Utrecht University’s AI Labs.