Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants
AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
07-08-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | AI assistants are being increasingly used by students enrolled in higher
education institutions. While these tools provide opportunities for improved
teaching and education, they also pose significant challenges for assessment
and learning outcomes. We conceptualize these challenges through the lens of
vulnerability, the potential for university assessments and learning outcomes
to be impacted by student use of generative AI. We investigate the potential
scale of this vulnerability by measuring the degree to which AI assistants can
complete assessment questions in standard university-level STEM courses.
Specifically, we compile a novel dataset of textual assessment questions from
50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4
can adequately answer these questions. We use eight prompting strategies to
produce responses and find that GPT-4 answers an average of 65.8% of questions
correctly, and can even produce the correct answer across at least one
prompting strategy for 85.1% of questions. When grouping courses in our dataset
by degree program, these systems already pass non-project assessments of large
numbers of core courses in various degree programs, posing risks to higher
education accreditation that will be amplified as these models improve. Our
results call for revising program-level assessment design in higher education
in light of advances in generative AI. |
---|---|
DOI: | 10.48550/arxiv.2408.11841 |