A corpus-driven comparative analysis of AI in academic discourse: Investigating ChatGPT-generated academic texts in social sciences
•The study compares linguistic patterns between AI-generated and human-authored academic articles in social sciences.•ChatGPT’s limitations include the overuse of infrequent academic vocabulary and excessively flowery language.•Human-authored texts exhibit greater syntactic complexity reflected in t...
Saved in:
Published in: | Lingua Vol. 312; p. 103838 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier B.V
01-12-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •The study compares linguistic patterns between AI-generated and human-authored academic articles in social sciences.•ChatGPT’s limitations include the overuse of infrequent academic vocabulary and excessively flowery language.•Human-authored texts exhibit greater syntactic complexity reflected in the extensive use of subordination.•ChatGPT’s use of formulaic sequences only partly mirrors human-authored academic articles.•ChatGPT generates texts through synonym substitution within syntactically equivalent structures.
Since its release in 2022, ChatGPT has found widespread application across various disciplines. While previous studies on Generative AI’s capabilities have predominantly concentrated on content quality assessments, little attention has been directed toward investigating the model’s linguistic patterns compared to human-generated language. To address this gap, we built two specialized corpora comprised of academic texts in social sciences generated by ChatGPT-4o mini and selected the Elsevier OA CC-BY Corpus as a reference for comparison, with a view to identifying commonalities and differences between AI-generated and human academic language and determining whether academic language instructions improve the model’s output in terms of formal rigor. The findings revealed limitations in ChatGPT’s handling of academic discourse in the following respects: overuse of infrequent “academic” vocabulary, limited use of subordination, and syntactic and semantic homogeneity. Besides, the effect of specific language-oriented prompts is primarily reflected in minor lexical adjustments. This study expands the scope of corpus linguistics research by incorporating AI-generated texts into the analytical framework and lays the groundwork for future improvements in the language model’s genre discrimination. |
---|---|
ISSN: | 0024-3841 |
DOI: | 10.1016/j.lingua.2024.103838 |