University of Notre Dame and IBM Research build tools for AI governance

July 16, 2025

The BenchmarkCards framework, a collection of datasets, benchmarks, and mitigations that serves as a guide for developers to build safe and transparent AI systems, was recently incorporated into IBM’s Risk Atlas Nexus, the company’s open-source AI toolkit for governance of foundation models. Through support from the Notre Dame-IBM Technology Ethics Lab, researchers at the University of Notre Dame’s Lucy Family Institute for Data & Society and IBM Research jointly developed the framework, targeting the entire community of researchers and developers, and providing a practical guideline for improved evaluation and mitigation of potential risks when developing AI models.

The development of Large Language Models (LLMs) and assessment of their capabilities is guided by their performance in benchmarks – the combination of datasets, evaluation metrics, and associated processing steps. Although these benchmarks serve this critical role, when misused, they can provide a false sense of safety or performance, leading to serious ethical and practical implications.

In education, the heavy emphasis on popular benchmarks such as Massive Multitask Language Understanding (MMLU) and Grade School Math 8K (GSM8K) to evaluate large language models like ChatGPT has contributed to the development of AI tutors and test proctoring systems that, while innovative, may be limited in promoting deep conceptual understanding, sometimes yield inconsistent results, and raise important questions about the use of personal biometric data and informed consent.

Source: https://research.nd.edu/news-and-events/news/university-of-notre-dame-and-ibm-research-build-tools-for-ai-governance/

Artificial Intelligence (AI) in Medicine

get in touch

+1 732 526 1166

ediga@sciinovhealth.com

Recent Updates

University of Notre Dame and IBM Research build tools for AI governance

Subscribe to our News & Updates