Exploring the Effects of Aggregation Choices on Untrained Visualization Users' Generalizations From Data

Visualization system designers must decide whether and how to aggregate data by default. Aggregating distributional information in a single summary mark like a mean or sum simplifies interpretation, but may lead untrained users to overlook distributional features. We ask, How are the conclusions dra...

Full description

Saved in:
Bibliographic Details
Published in:Computer graphics forum Vol. 39; no. 6; pp. 33 - 48
Main Authors: Nguyen, F., Qiao, X., Heer, J., Hullman, J.
Format: Journal Article
Language:English
Published: Oxford Blackwell Publishing Ltd 01-09-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Visualization system designers must decide whether and how to aggregate data by default. Aggregating distributional information in a single summary mark like a mean or sum simplifies interpretation, but may lead untrained users to overlook distributional features. We ask, How are the conclusions drawn by untrained visualization users affected by aggregation strategy? We present two controlled experiments comparing generalizations of a population that untrained users made from visualizations that summarized either a 1000 record or 50 record sample with either single mean summary mark, a disaggregated view with one mark per observation or a view overlaying a mean summary mark atop a disaggregated view. While we observe no reliable effect of aggregation strategy on generalization accuracy at either sample size, users of purely disaggregated views were slightly less confident in their generalizations on average than users whose views show a single mean summary mark, and less likely to engage in dichotomous thinking about effects as either present or absent. Comparing results from 1000 record to 50 record data set, we see a considerably larger decrease in the number of generalizations produced and reported confidence in generalizations among viewers who saw disaggregated data relative to those who saw only mean summary marks. Visualization system designers must decide whether and how to aggregate data by default. Aggregating distributional information in a single summary mark like a mean or sum simplifies interpretation, but may lead untrained users to overlook distributional features. We ask, How are the conclusions drawn by untrained visualization users affected by aggregation strategy? We present two controlled experiments comparing generalizations of a population that untrained users made from visualizations that summarized either a 1000 record or 50 record sample with either single mean summary mark, a disaggregated view with one mark per observation or a view overlaying a mean summary mark atop a disaggregated view. While we observe no reliable effect of aggregation strategy on generalization accuracy at either sample size, users of purely disaggregated views were slightly less confident in their generalizations on average than users whose views show a single mean summary mark, and less likely to engage in dichotomous thinking about effects as either present or absent. Comparing results from 1000 record to 50 record data set, we see a considerably larger decrease in the number of generalizations produced and reported confidence in generalizations among viewers who saw disaggregated data relative to those who saw only mean summary marks.
ISSN:0167-7055
1467-8659
DOI:10.1111/cgf.13902