Main research questions
RQ1: What is the effect of simulating and visualizing data on people’s task duration estimates?
We hypothesize the dotplot will help participants make a more informed decision as the duration of a task can be represented with a mixture distribution, which is hard to mentally process. However, our study design does not allow us to establish a ground truth for the accuracy of their estimation and thus, we will only look at the influence (induced variation in estimates) of the dotplot.
Moreover, past research reports that underestimations are more often reported than overestimation (for medium to long tasks), so we are expecting too low values (underestimates) which are then corrected for greater values (less optimistic estimates). Still, the other way around might occur as it remains unclear whether the direction of this prediction bias (over- or underestimation) is fixed for this specific task, and the fact that more studies report underestimations might be due to the field they come from (see Halkjelsvik and Jørgensen (2012)). In other words, even though underestimates are more likely, according to the literature, overestimates would not necessarily mean the participants’ estimates are not erroneous.
- Do their point estimates change overall (from first to third estimation)? If so, it would mean the debiasing methods and/or the dotplot induced a change of beliefs.
# The descriptive analysis above shows that we have a few very extreme outliers which may affect our analysis disproportionately. We therefore additionally run the same analysis with data where we filter out all participants who are more than 3 SD from the mean
dataf <- data %>% ungroup() %>%
mutate(abs_z_perc_31 = abs(scale(perc_diff_31)), abs_z_perc_32 = abs(scale(perc_diff_32)), abs_z_perc_31_gap = abs(scale(perc_diff_31_gap)), abs_z_perc_32_gap = abs(scale(perc_diff_32_gap))) %>%
filter(abs_z_perc_31 <= 3, abs_z_perc_32 <= 3 , abs_z_perc_31_gap <= 3, abs_z_perc_32_gap <= 3)
cat("The additional analysis excluding participants whose responses are more than 3 SD from the mean in any of the percent differences for the central tendency or the interval size ignores the data from ", nrow(data) - nrow(dataf), " participants.\n")
## The additional analysis excluding participants whose responses are more than 3 SD from the mean in any of the percent differences for the central tendency or the interval size ignores the data from 7 participants.
# the dataframe to hold the results answering RQ1
df_rq1 = data.frame(name = character(), condition = character(), estimate = numeric(), ci_lower = numeric(), ci_upper = numeric(), order = integer())
if (length(perc_diff_31[perc_diff_31 != 0]) == 0) {
cat(" All expected values provided are similar across all participants (no difference between the expected values of the first and third estimates).\n")
} else {
obs11 = meanCI.bootstrap(perc_diff_31, conf.level = lvl_conf, seed = seed2)
obs11c = meanCI.bootstrap(dataf$perc_diff_31, conf.level = lvl_conf, seed = seed2)
df_rq1 <- df_rq1 %>% compile.df(obs11, "Increase from initial to final estimate", "all", 1)
df_rq1 <- df_rq1 %>% compile.df(obs11c, "initial to final (without outliers)", "all", 1.5)
cat("The average normalized expected values' variation of the third estimation compared to the first one is", formatCI(obs11),"\n")
if (obs11$ci_lower <= 0 & obs11$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in expected value between the two estimates.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in expected value between the two estimates.\n")
if (obs11$estimate < 0) {
cat("More precisely, more optimistic estimations are observed, suggesting their first estimate was an overestimation.\n")
} else {
cat("More precisely, less optimistic estimations are observed, suggesting their first estimate was an underestimation.\n")
}
}
}
## The average normalized expected values' variation of the third estimation compared to the first one is 0.7, 95% CI [0.48, 1]
## The data analysis suggests there is a noticeable difference in expected value between the two estimates.
## More precisely, less optimistic estimations are observed, suggesting their first estimate was an underestimation.
if (length(data2$perc_diff_31[data2$perc_diff_31 != 0]) != 0) {
obs11_verif = diffMeanCI.bootstrap(perc_diff_31,data2$perc_diff_31,conf.level = lvl_conf, seed = seed2)
if (!(obs11_verif$ci_lower <= 0 & obs11_verif$ci_upper >= 0)) {
cat("NOTICE: Whether we interpret participants' estimates as the expected value or the mean influences the data analysis' results.\n")
cat("The difference between the two measures when using the mean:", formatCI(obs11_verif),"\n")
}
}
Which part of the potential increase, or change, can be attributed to the visualization (from second to third estimation)? To answer that question, we look at the difference between the second and third estimates.
if (length(perc_diff_32[perc_diff_32 != 0]) == 0) {
cat(" All expected values provided are similar across all participants (no difference between the expected values of the first and third estimates).\n")
} else {
obs11bis = meanCI.bootstrap(perc_diff_32, conf.level = lvl_conf, seed = seed2)
obs11bisc = meanCI.bootstrap(dataf$perc_diff_32, conf.level = lvl_conf, seed = seed2)
df_rq1 <- df_rq1 %>% compile.df(obs11bis, "Increase due to dotplot", "all", 2)
df_rq1 <- df_rq1 %>% compile.df(obs11bisc, "due to dotplot (without outliers)", "all", 2.5)
cat("The average normalized expected values' variation of the third estimation compared to the second one is", formatCI(obs11bis),"\n")
if (obs11bis$ci_lower <= 0 & obs11bis$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in expected values between the two estimates.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in expected values between the two estimates.\n")
if (obs11bis$estimate < 0) {
cat("More precisely, more optimistic estimations are observed for the third estimation.\n")
} else {
cat("More precisely, less optimistic estimations are observed for the third estimation.\n")
}
}
}
## The average normalized expected values' variation of the third estimation compared to the second one is 0.83, 95% CI [0.54, 1.5]
## The data analysis suggests there is a noticeable difference in expected values between the two estimates.
## More precisely, less optimistic estimations are observed for the third estimation.
if (length(data2$perc_diff_32[data2$perc_diff_32 != 0]) != 0) {
obs11bis_verif = diffMeanCI.bootstrap(perc_diff_32,data2$perc_diff_32,conf.level = lvl_conf, seed = seed2)
if (!(obs11bis_verif$ci_lower <= 0 & obs11bis_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs11bis_verif),"\n")
}
}
- The mean or median of their estimate is not the only reported variable. We can also analyse the gap, or interval’s width they provided for their estimates. Do their uncertainty ranges increase overall (from first to third estimation)? Indeed, larger interval implies more uncertainty, whereas reduced interval means they are more confident in their estimates (less uncertainty).
if (length(perc_diff_31_gap[perc_diff_31_gap != 0]) == 0) {
cat(" All interval values provided are similar across all participants (no difference between the interval's length of the first and third estimates).\n")
} else {
obs12 = meanCI.bootstrap(perc_diff_31_gap, conf.level = lvl_conf, seed = seed2)
obs12c = meanCI.bootstrap(dataf$perc_diff_31_gap, conf.level = lvl_conf, seed = seed2)
df_rq1 <- df_rq1 %>% compile.df(obs12, "Interval width increase from first to final estimate", "all", 4)
df_rq1 <- df_rq1 %>% compile.df(obs12c, "interval from initial to final (without outliers)", "all", 4.5)
cat("The average normalized standard deviation variation of the third estimation compared to the first one is", formatCI(obs12),"\n")
if (obs12$ci_lower <= 0 & obs12$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in standard deviation between the two estimates.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in standard deviation between the two estimates.\n")
if (obs12$estimate < 0) {
cat("More precisely, less uncertainty is observed in the third estimate.\n")
} else {
cat("More precisely, more uncertainty is observed in the third estimate.\n")
}
}
}
## The average normalized standard deviation variation of the third estimation compared to the first one is 0.62, 95% CI [0.41, 1.1]
## The data analysis suggests there is a noticeable difference in standard deviation between the two estimates.
## More precisely, more uncertainty is observed in the third estimate.
if (length(data2$perc_diff_31_gap[data2$perc_diff_31_gap != 0]) != 0) {
obs12_verif = diffMeanCI.bootstrap(perc_diff_31_gap,data2$perc_diff_31_gap,conf.level = lvl_conf, seed = seed2)
if (!(obs12_verif$ci_lower <= 0 & obs12_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs12_verif),"\n")
}
}
Which part can be be attributed to the visualization (from second to third estimation)?
if (length(perc_diff_32_gap[perc_diff_32_gap != 0]) == 0) {
cat(" All interval values provided are similar across all participants (no difference between the interval's length of the first and third estimates).\n")
} else {
obs12bis = meanCI.bootstrap(perc_diff_32_gap, conf.level = lvl_conf, seed = seed2)
obs12bisc = meanCI.bootstrap(dataf$perc_diff_32_gap, conf.level = lvl_conf, seed = seed2)
df_rq1 <- df_rq1 %>% compile.df(obs12bis, "Interval width increase due to dotplot", "all", 5)
df_rq1 <- df_rq1 %>% compile.df(obs12bisc, "interval due to dotplot (without outliers)", "all", 5.5)
cat("The average normalized standard deviation variation of the third estimation compared to the second one is", formatCI(obs12bis),"\n")
if (obs12bis$ci_lower <= 0 & obs12bis$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in standard deviation between the two estimates.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in standard deviation between the two estimates.\n")
if (obs12bis$estimate < 0) {
cat("More precisely, less uncertainty is observed in the third estimate.\n")
} else {
cat("More precisely, more uncertainty is observed in the third estimate.\n")
}
}
}
## The average normalized standard deviation variation of the third estimation compared to the second one is 0.4, 95% CI [0.16, 0.9]
## The data analysis suggests there is a noticeable difference in standard deviation between the two estimates.
## More precisely, more uncertainty is observed in the third estimate.
if (length(data2$perc_diff_32_gap[data2$perc_diff_32_gap != 0]) != 0) {
obs12bis_verif = diffMeanCI.bootstrap(perc_diff_32_gap,data2$perc_diff_32_gap,conf.level = lvl_conf, seed = seed2)
if (!(obs12bis_verif$ci_lower <= 0 & obs12bis_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs12bis_verif),"\n")
}
}
An overview visualization reporting our results for research question 1: What is the effect of simulating and visualizing data on people’s task duration estimates?
# data frame holding the raw data for the histograms
df_rq1_raw <- dataf %>% select(c(participant_id, perc_diff_31, perc_diff_32, perc_diff_31_gap, perc_diff_32_gap)) %>% tidyr::pivot_longer(cols = c(perc_diff_31, perc_diff_32, perc_diff_31_gap, perc_diff_32_gap), names_prefix = "perc_diff", names_to = "difference")
df_rq1 %>% ggplot(aes(x = estimate, xmin = ci_lower, xmax = ci_upper, y = reorder(name, -order))) +
geom_pointrange() + scale_x_continuous(labels = scales::percent) +
geom_vline(xintercept = 0) +
labs(title = "Effect of simulating and visualizing data on people's\ntask duration estimates", subtitle = "Percent change differences from initial estimates to after seeing a visualization", x = "Percent difference between estimates", y = NULL, caption = "All error bars shows 95% bootstrapped confidence intervals") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9))
ggsave("RQ1CI.pdf", height = 3)
## Saving 7 x 3 in image
df_rq1_raw %>% ggplot() + geom_histogram(aes(x = value), binwidth = 0.1) + scale_x_continuous(labels = scales::percent, breaks = c(0, 1, 2, 3, 4)) + facet_wrap(~difference) + theme_void() + theme(axis.text.x = element_text(), axis.ticks.x = element_line(size = 1), axis.ticks.length.x = unit(1, "mm"))
ggsave("RQ1Hist.pdf", height = 3)
## Saving 7 x 3 in image
RQ2: Do quantile dotplots help indicating 95% prediction intervals?
There were two conditions for the dotplot which made the number of dots (either 20 or 50) vary.
- A 95% prediction interval was required throughout the whole study. As the dot plot allows participants to count the dots and thus mentally compute a more precise estimation based on that instruction, their ability to understand and process the information presented to them can be investigated. Indeed, what is the size of the prediction interval people indicate after seeing a quantile dotplot? We expect to find a value around 95% but it might depend on whether this percentage is calculated based on the whole distribution, or what they see on the dotplot.
If this percentage coverage is based on the whole distribution:
# the dataframe to hold the results answering RQ2
df_rq2 = data.frame(name = character(), condition = character(), estimate = numeric(), ci_lower = numeric(), ci_upper = numeric(), order = integer())
obs21 = meanCI.bootstrap(coverage3, conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21, "Final estimate", "all", 1)
cat("The mean actual size of prediction interval of the third estimate is", formatCI(obs21,unit="%"),"\n")
## The mean actual size of prediction interval of the third estimate is 71%, 95% CI [ 66%, 75%]
if (obs21$ci_lower <= 95 & obs21$ci_upper >= 95) {
cat("The data analysis suggests this is close enough to a 95% CI, as was explicitly requested.\n")
} else {
cat("The data analysis suggests participants did not provide an estimation following a 95% CI.\n")
}
## The data analysis suggests participants did not provide an estimation following a 95% CI.
obs21_verif = diffMeanCI.bootstrap(coverage3,data2$coverage3,conf.level = lvl_conf, seed = seed2)
if (!(obs21_verif$ci_lower <= 0 & obs21_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs21_verif,unit="%"),"\n")
}
If it is based on the dot-plots’ information:
obs21bis = meanCI.bootstrap(coverage_dotplot, conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21bis, "Final estimate (inferred from q-dotplot)", "all", 2)
cat("The mean actual size of prediction interval of the third estimate is", formatCI(obs21bis,unit="%"),"\n")
## The mean actual size of prediction interval of the third estimate is 72%, 95% CI [ 66%, 77%]
if (obs21bis$ci_lower <= 95 & obs21bis$ci_upper >= 95) {
cat("The data analysis suggests this is close enough to a 95% CI, as was explicitly requested.\n")
} else {
cat("The data analysis suggests participants did not provide an estimation following a 95% CI.\n")
}
## The data analysis suggests participants did not provide an estimation following a 95% CI.
obs21bis_verif = diffMeanCI.bootstrap(coverage_dotplot,data2$coverage_dotplot,conf.level = lvl_conf, seed = seed2)
if (!(obs21bis_verif$ci_lower <= 0 & obs21bis_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs21bis_verif,unit="%"),"\n")
}
But in order to be able to account the dot-plot for an effect, we must ask: is it really different than the previous estimate? Yet, as it would not make sense to compare it based on the dot-plot’s information, only the samples from the computed probability distribution are used.
The second estimation’s coverage as a percentage, based on the whole distribution:
obs21ter = meanCI.bootstrap(coverage2, conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21ter, "Second estimate (after reflection)", "all", 3)
cat("The mean actual size of prediction interval of the second estimate is", formatCI(obs21ter,unit="%"),"\n")
## The mean actual size of prediction interval of the second estimate is 49%, 95% CI [ 43%, 55%]
obs21ter = meanCI.bootstrap(coverage1, conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21ter, "First estimate", "all", 3)
cat("The mean actual size of prediction interval of the first estimate is", formatCI(obs21ter,unit="%"),"\n")
## The mean actual size of prediction interval of the first estimate is 42%, 95% CI [ 36%, 49%]
if (obs21ter$ci_lower <= 95 & obs21ter$ci_upper >= 95) {
cat("The data analysis suggests this is close enough to a 95% CI, as was explicitly requested.\n")
} else {
cat("The data analysis suggests participants did not provide an estimation following a 95% CI.\n")
}
## The data analysis suggests participants did not provide an estimation following a 95% CI.
obs21ter_verif = diffMeanCI.bootstrap(coverage2,data2$coverage2,conf.level = lvl_conf, seed = seed2)
if (!(obs21ter_verif$ci_lower <= 0 & obs21ter_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs21ter_verif,unit="%"),"\n")
}
eff = FALSE
obs21quater = meanCI.bootstrap(coverage3 - coverage2,conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21quater, "Net difference attributable to q-dotplot", "difference", 4)
cat("The average difference in actual sizes of prediction intervals between the second and third estimates is", formatCI(obs21quater,unit="%"),"\n")
## The average difference in actual sizes of prediction intervals between the second and third estimates is 22%, 95% CI [ 16%, 28%]
if (obs21quater$ci_lower <= 0 & obs21quater$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in actual size of prediction interval between the two estimates.\n")
} else {
eff = TRUE
cat("The data analysis suggests there is a noticeable difference in actual size of prediction interval between the two estimates.\n")
if(obs21quater$estimate < 0) {
cat("More precisely, the second estimation was closer to a 95% coverage size.\n")
} else {
cat("More precisely, the third estimation was closer to a 95% coverage size.\n")
}
}
## The data analysis suggests there is a noticeable difference in actual size of prediction interval between the two estimates.
## More precisely, the third estimation was closer to a 95% coverage size.
obs21quater_verif = diffMeanCI.bootstrap(data2$coverage2,data2$coverage3,conf.level = lvl_conf, seed = seed2)
if (((obs21quater_verif$ci_lower <= 0 & obs21quater_verif$ci_upper >= 0) & eff) | (!(obs21quater_verif$ci_lower <= 0 & obs21quater_verif$ci_upper >= 0) & !eff)){
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("The alternative measure gave an average difference of :", formatCI(obs21quater_verif,unit="%"),"\n")
}
- What is the effect of the number of dots on the actual size of the prediction interval? The 50-dot plot carries more information as outliers have a greater chance to appear. Thereby, we want to know whether the number of dots influences the size of the prediction interval, or in other words, if more dots lead to prediction interval covering more instances.
obs22_20 = meanCI.bootstrap(coverage_dotplot[num_dots == 20],conf.level = lvl_conf, seed = seed2)
obs22_50 = meanCI.bootstrap(coverage_dotplot[num_dots == 50],conf.level = lvl_conf, seed = seed2)
cat("The average coverage percentage in the 20-dot condition is", formatCI(obs22_20,unit="%"),"\n")
## The average coverage percentage in the 20-dot condition is 73%, 95% CI [ 66%, 79%]
cat("The average coverage percentage in the 50-dot condition is", formatCI(obs22_50,unit="%"),"\n")
## The average coverage percentage in the 50-dot condition is 70%, 95% CI [ 62%, 77%]
df_rq2 <- df_rq2 %>% compile.df(obs22_20, "When seeing a 20-dot q-dotplot", "20", 5)
df_rq2 <- df_rq2 %>% compile.df(obs22_50, "When seeing a 50-dot q-dotplot", "50", 6)
eff = FALSE
obs22 = diffMeanCI.bootstrap(coverage_dotplot[num_dots == 50],coverage_dotplot[num_dots == 20], conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs22, "Difference between 20 and 50 dots q-dotplot", "difference", 7)
cat("The difference in actual size of the prediction interval between the 20- and 50-dot conditions is", formatCI(obs22),"\n")
## The difference in actual size of the prediction interval between the 20- and 50-dot conditions is -2.3, 95% CI [-12, 7.6]
if (obs22$ci_lower <= 0 & obs21$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in actual size of prediction interval between the two condition.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in actual size of prediction interval between the two estimates.\n")
if (obs22$estimate < 0){
cat("More precisely, the 50-dot group provided larger prediction intervals (covering more instances).\n")
} else {
cat("More precisely, the 20-dot group provided larger prediction intervals (covering more instances).\n")
}
}
## The data analysis does not allow us to say there is a clear difference in actual size of prediction interval between the two condition.
obs22_verif = diffMeanCI.bootstrap(data2$coverage_dotplot[data2$num_dots == 20],data2$coverage_dotplot[data2$num_dots == 50],conf.level = lvl_conf, seed = seed2)
if (((obs22_verif$ci_lower <= 0 & obs22_verif$ci_upper >= 0) & eff) | (!(obs22_verif$ci_lower <= 0 & obs22_verif$ci_upper >= 0) & !eff)){
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("The alternative measure gave an average difference of :", formatCI(obs22_verif,unit="%"),"\n")
}
An overview visualization reporting our results for research question 1: What is the effect of simulating and visualizing data on people’s task duration estimates?
df_rq2 %>% ggplot(aes(x = estimate/100, xmin = ci_lower/100, xmax = ci_upper/100, y = reorder(name, -order))) +
geom_pointrange() + scale_x_continuous(labels = scales::percent) +
geom_vline(xintercept = 0) + geom_vline(xintercept = 0.95, color = "red") + annotate("text", label = "targeted size", x = 0.85:0.95, 7.5, color = "red", size = 3) +
labs(title = "Effect of quantile dotplots on the size of elicited prediction intervals", subtitle = "How much of the simulated data is covered by the prediction intervals?", x = "Prediction interval size", y = NULL, caption = "All error bars shows 95% bootstrapped confidence intervals") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9), axis.title.x = element_text(size = 9))
ggsave("RQ2CI.pdf", height = 5, width = 5)
df_rq2_raw <- data %>% select(c(participant_id, coverage1, coverage_dotplot, coverage2, coverage3)) %>% tidyr::pivot_longer(cols = c(coverage1, coverage_dotplot, coverage2, coverage3), names_to = "coverage")
d20 <- data %>% filter(num_dots == 20)
d50 <- data %>% filter(num_dots == 50)
df_rq2_raw <- rbind(data.frame(participant_id = d20$participant_id, coverage = rep("coverage_20", nrow(d20)), value = d20$coverage3 ), df_rq2_raw)
df_rq2_raw <- rbind(data.frame(participant_id = d50$participant_id, coverage = rep("coverage_50", nrow(d50)), value = d50$coverage3 ), df_rq2_raw)
rq2hist1 = df_rq2_raw %>% ggplot() + geom_histogram(aes(x = value), binwidth = 2.5) + scale_x_continuous(labels = scales::percent, breaks = seq(0, 100, by=12.5)) + facet_wrap(~coverage) + theme_void() + theme(axis.text.x = element_text(), axis.ticks.x = element_line(size = 1), axis.ticks.length.x = unit(1, "mm"))
ggsave("RQ2Histfull.pdf", height = 3)
## Saving 7 x 3 in image
RQ3: What is the effect of type of representation (slider vs visualization) on people’s decision?
- What is the difference between the first and the final decision on the likelihood that people would miss their train? The first decision is the participants’ best guess, but they should rectify their decision for their preferred likelihood of missing a train. As we do not wish to make a difference between participants that are too optimistic and those who are too pessimistic, we use the unsigned difference of likelihood.
# Data frame to hold all results to answer RQ3
df_rq3 = data.frame(name = character(), condition = character(), estimate = numeric(), ci_lower = numeric(), ci_upper = numeric(), order = integer())
if (nrow(data) == 0){
cat("All participants were irrational.\n")
} else if (nrow(data) == 1) {
cat("Only 1 participant was not irrational, which is not enough for an analysis.\n")
} else {
obs31 = meanCI.bootstrap(abs_change_trains_missed, conf.level = lvl_conf, seed = seed2)
obs31_signed = meanCI.bootstrap(trains_missed2 - trains_missed , conf.level = lvl_conf, seed = seed2)
obs31_vis = meanCI.bootstrap(abs_change_trains_missed[feedback_type == "visualization"], conf.level = lvl_conf, seed = seed2)
obs31_slider = meanCI.bootstrap(abs_change_trains_missed[feedback_type == "slider"], conf.level = lvl_conf, seed = seed2)
obs31_diff = diffMeanCI.bootstrap(abs_change_trains_missed[feedback_type == "visualization"], abs_change_trains_missed[feedback_type == "slider"], conf.level = lvl_conf, seed = seed2)
df_rq3 <- df_rq3 %>% compile.df(obs31_signed, "Signed difference after seeing any feedback", "all", 1)
df_rq3 <- df_rq3 %>% compile.df(obs31, "Absolute difference after seeing any feedback", "all", 2)
df_rq3 <- df_rq3 %>% compile.df(obs31_vis, " --- visualization feedback", "visualization", 3)
df_rq3 <- df_rq3 %>% compile.df(obs31_slider, " --- slider feedback", "slider", 4)
df_rq3 <- df_rq3 %>% compile.df(obs31_diff, "Difference (visualization - slider)", "difference", 5)
cat("The difference in likelihood of missing a train between first and final decision is", formatCI(obs31),"\n")
if (obs31$ci_lower <= 0 & obs31$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in likelihood between the two decisions\n")
} else {
cat("The data analysis suggests there is a noticeable difference in likelihood between the two decisions\n")
}
}
## The difference in likelihood of missing a train between first and final decision is 15, 95% CI [ 12, 20]
## The data analysis suggests there is a noticeable difference in likelihood between the two decisions
- What is the difference between the first and the final decision on waiting time? As previously, we use the absolute difference between the first and final waiting time which was selected.
if (nrow(data) == 0){
cat("All participants were irrational.\n")
} else if (nrow(data) == 1) {
cat("Only 1 participant was not irrational, which is not enough for an analysis.\n")
} else {
obs32 = meanCI.bootstrap(abs_change_waiting_time, conf.level = lvl_conf, seed = seed2)
obs32_signed = meanCI.bootstrap(waiting_time2 - waiting_time, conf.level = lvl_conf, seed = seed2)
obs32_vis = meanCI.bootstrap(abs_change_waiting_time[feedback_type == "visualization"], conf.level = lvl_conf, seed = seed2)
obs32_slider = meanCI.bootstrap(abs_change_waiting_time[feedback_type == "slider"], conf.level = lvl_conf, seed = seed2)
obs32_diff = diffMeanCI.bootstrap(abs_change_waiting_time[feedback_type == "visualization"], abs_change_waiting_time[feedback_type == "slider"], conf.level = lvl_conf, seed = seed2)
df_rq3 <- df_rq3 %>% compile.df(obs32, "Absolute difference after seeing any feedback", "all", 6)
df_rq3 <- df_rq3 %>% compile.df(obs32_signed, "Signed difference after seeing any feedback", "all", 7)
df_rq3 <- df_rq3 %>% compile.df(obs32_vis, " --- visualization feedback", "visualization", 8)
df_rq3 <- df_rq3 %>% compile.df(obs32_slider, " --- slider feedback", "slider", 9)
df_rq3 <- df_rq3 %>% compile.df(obs32_diff, "Difference (visualization - slider)", "difference", 10)
cat("The difference in waiting time between first and final decision is", formatCI(obs32),"\n")
if (obs32$ci_lower <= 0 & obs32$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in waiting time between the two decisions\n")
} else {
cat("The data analysis suggests there is a noticeable difference in waiting time between the two decisions\n")
}
}
## The difference in waiting time between first and final decision is 17, 95% CI [ 13, 22]
## The data analysis suggests there is a noticeable difference in waiting time between the two decisions
# add a column to our df to be able to tell if we're looking at waiting time or trains missed
df_rq3$type = ifelse(df_rq3$order < 6, "trains missed", "waiting time")
An overview visualization reporting our results for research question 1: What is the effect of simulating and visualizing data on people’s task duration estimates?
# Plotting data using a facet_wrap only makes sense if the data ranges are similar for the final results data. If they have considerably different ranges (which may happen since it's not the same unit <number of trains missed versus waiting time>), then we will plot these as two separate charts which we arrange side by side
df_rq3 %>% ggplot(aes(x = estimate, xmin = ci_lower, xmax = ci_upper, y = reorder(name, -order), group = type)) +
geom_pointrange() + scale_x_continuous() + facet_wrap(~ type) +
geom_vline(xintercept = 0) +
labs(title = "Effect on the decision when to leave for a train after receiving\nfeedback based on simulation data", subtitle = "How much do estimates change and how much does feedback type matter?", x = "", y = NULL, caption = "All error bars shows 95% bootstrapped confidence intervals") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9), axis.title.x = element_text(size = 9))
ggsave("RQ3CI.pdf", height = 3)
## Saving 7 x 3 in image
df_rq3_raw <- data %>% group_by(participant_id) %>% transmute(abs_change_waiting_time = abs_change_waiting_time, abs_change_trains_missed = abs_change_trains_missed, change_trains_missed = trains_missed2 - trains_missed, change_waiting_time = waiting_time2 - waiting_time) %>%
pivot_longer(cols=c(abs_change_waiting_time, abs_change_trains_missed, change_trains_missed, change_waiting_time), names_to = "measure")
dvis <- data %>% filter(feedback_type == "visualization")
dslider <- data %>% filter(feedback_type == "slider")
df_rq3_raw <- rbind(df_rq3_raw, data.frame(participant_id = dvis$participant_id, measure = rep("abs_waiting_vis", nrow(dvis)), value = dvis$abs_change_waiting_time))
df_rq3_raw <- rbind(df_rq3_raw, data.frame(participant_id = dslider$participant_id, measure = rep("abs_waiting_slider", nrow(dslider)), value = dslider$abs_change_waiting_time))
df_rq3_raw <- rbind(df_rq3_raw, data.frame(participant_id = dvis$participant_id, measure = rep("abs_trains_vis", nrow(dvis)), value = dvis$abs_change_trains_missed))
df_rq3_raw <- rbind(df_rq3_raw, data.frame(participant_id = dslider$participant_id, measure = rep("abs_trains_slider", nrow(dslider)), value = dslider$abs_change_trains_missed))
df_rq3_raw %>% ggplot() + geom_histogram(aes(x = value), binwidth = 1) + scale_x_continuous(breaks = seq(-100, 100, by=50)) + facet_wrap(~measure) + theme_void() + theme(axis.text.x = element_text(), axis.ticks.x = element_line(size = 1), axis.ticks.length.x = unit(1, "mm"))
ggsave("RQ3Hist.pdf", height = 3)
## Saving 7 x 3 in image
- Do people using the visualization take longer to come to a decision than people using the slider? We hypothesized a better visualization would ease the decision-making process and thus that the linechart would allow participants to select their preferred departure time quicker than those who used the slider.
df_rq3bis = data.frame(name = character(), condition = character(), estimate = numeric(), ci_lower = numeric(), ci_upper = numeric(), order = integer())
p13.vis = page_data %>% filter(page == 13, feedback_type == 'visualization')
p13.slider = page_data %>% filter(page == 13, feedback_type == 'slider')
obs33 = diffGeomMeanCI.bootstrap(p13.vis$duration,p13.slider$duration, conf.level = lvl_conf, seed = seed2)
obs33_vis = geomMeanCI.bootstrap(p13.vis$duration, conf.level = lvl_conf, seed = seed2)
obs33_slider = geomMeanCI.bootstrap(p13.slider$duration, conf.level = lvl_conf, seed = seed2)
obs33bis = diffGeomMeanCI.bootstrap(time_spent_interacting[feedback_type == 'visualization'], time_spent_interacting[feedback_type == 'slider'], conf.level = lvl_conf, seed = seed2)
obs33bis_vis = geomMeanCI.bootstrap(time_spent_interacting[feedback_type == 'visualization'], conf.level = lvl_conf, seed = seed2)
obs33bis_slider = geomMeanCI.bootstrap(time_spent_interacting[feedback_type == 'slider'], conf.level = lvl_conf, seed = seed2)
df_rq3bis <- df_rq3bis %>% compile.df(obs33_vis, "Visualization", "visualization", 1)
df_rq3bis <- df_rq3bis %>% compile.df(obs33_slider, "Slider", "slider", 2)
df_rq3bis <- df_rq3bis %>% compile.df(obs33, "Difference", "difference", 3)
df_rq3bis <- df_rq3bis %>% compile.df(obs33bis_vis, "Visualization", "visualization", 4)
df_rq3bis <- df_rq3bis %>% compile.df(obs33bis_slider, "Slider", "slider", 5)
df_rq3bis <- df_rq3bis %>% compile.df(obs33bis, "Difference", "difference", 6)
cat("The difference in time taken to make a decision between the feedback-type conditions is", formatCI(obs33,unit="s"),"\n")
## The difference in time taken to make a decision between the feedback-type conditions is 43s, 95% CI [ 26s, 60s]
if (obs33$ci_lower <= 0 & obs33$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in time take to make a decision between the two conditions.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in time taken to make a decision between the two conditions.\n")
if (obs33$estimate < 0){
cat("More precisely, the slider group took longer.\n")
} else {
cat("More precisely, the linechart group took longer.\n")
}
}
## The data analysis suggests there is a noticeable difference in time taken to make a decision between the two conditions.
## More precisely, the linechart group took longer.
df_rq3bis$type <- ifelse(df_rq3bis$order < 4, "time on page", "time interacting")
An overview visualization reporting our results for research question 3bis: What is the effect of simulating and visualizing data on people’s task duration estimates?
# Plotting data using a facet_wrap only makes sense if the data ranges are similar for the final results data. If they have considerably different ranges (which may happen since it's not the same unit <number of trains missed versus waiting time>), then we will plot these as two separate charts which we arrange side by side
df_rq3bis %>% filter(type == "time on page") %>% ggplot(aes(x = estimate, xmin = ci_lower, xmax = ci_upper, y = reorder(name, -order))) +
geom_pointrange() + scale_x_continuous() +
geom_vline(xintercept = 0) +
labs(title = "Time spent on the feedback", subtitle = "", x = "Time (s)", y = NULL, caption = "All point estimates are geometric means and all error bars shows 95% bootstrapped confidence intervals") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9), axis.title.x = element_text(size = 9))
df_rq3bis %>% filter(type == "time interacting") %>% ggplot(aes(x = estimate, xmin = ci_lower, xmax = ci_upper, y = reorder(name, -order))) +
geom_pointrange() + scale_x_continuous() +
geom_vline(xintercept = 0) +
labs(title = "Time spent interacting with the presented feedback option", subtitle = "", x = "Time (s)", y = NULL, caption = "All point estimates are geometric means and all error bars shows 95% bootstrapped confidence intervals") +
theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9), axis.title.x = element_text(size = 9))
Additional questions
Debiasing methods
How do our results compare to previous work in psychology? How does breaking a task down into subtasks and considering events slowing one down or making one faster affect people’s estimation?
Estimating the duration of smaller components of a whole task produces greater values, which are not necessarily more accurate. This result has been observed by Connolly and Dean (1997) with a software development task, and by Forsyth and Burt (2008). Hayes-Roth (1980) also investigated this debiasing method but this time with a task involving similar errands. More precisely, the given sub-steps were enter store, find desired object, wait in line, pay for object and leave store. Based on these studies’ findings, we expect the participants to provide greater aggregated estimates when focusing on each sub-components than when predicting for the whole task. Nonetheless, the effect it might induce on the second estimation is uncertain for it depends on the participants’ ability and willingness to take the previous information into account.
The selection of the second debiasing method used for this experiment was mainly motivated by the information on the participants’ beliefs it would provide. Byram (1997) conducted an experiment in which surprises were manipulated but the results were inconclusive.
- How similar remain people’s responses? If the two first estimations of the whole task’s duration are quite similar in distribution, then it means the two debiasing methods had no effect. However, if there is a difference, we cannot precisely tell which factor caused it.
if (length(perc_diff_21[perc_diff_21 != 0]) == 0) {
cat(" All expected values provided are similar across all participants (no difference between the first and second estimate).\n")
} else {
obs41 = meanCI.bootstrap(abs(perc_diff_21), conf.level = lvl_conf, seed = seed2)
obs41sd = meanCI.bootstrap(abs(perc_diff_21), conf.level = lvl_conf, seed = seed2)
cat("The absolute average normalized difference between the second estimation and the first one is", formatCI(obs41),"\n")
if (obs41$ci_lower <= 0 & obs41$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference between the two first estimates.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in expected values between the two first estimates.\n")
}
}
## The absolute average normalized difference between the second estimation and the first one is 0.28, 95% CI [0.2, 0.47]
## The data analysis suggests there is a noticeable difference in expected values between the two first estimates.
if (length(data2$perc_diff_21[data2$perc_diff_21 != 0]) != 0) {
obs41_verif = diffMeanCI.bootstrap(abs(perc_diff_21),abs(data2$perc_diff_21),conf.level = lvl_conf, seed = seed2)
if (!(obs41_verif$ci_lower <= 0 & obs41_verif$ci_upper >= 0)) {
cat("NOTICE: Whether we treat participants'responses as central tendency or compute the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs41_verif),"\n")
}
}
- Before looking at the direction of correction between the first and second estimates, we can examine the sub-tasks’ estimates that were made, and more precisely: Are the sub-tasks’ estimates consistent with the literature? If so, then the expected value of the aggregated estimation of subtasks’ duration will be greater than the expected value of the first estimation.
if (length(perc_diff_st1[perc_diff_st1 != 0]) == 0) {
cat(" All expected values provided are similar across all participants (no difference between expected values of the first and aggregated sub-tasks' estimates).\n")
} else {
obs42 = meanCI.bootstrap(perc_diff_st1, conf.level = lvl_conf, seed = seed2)
cat("The average normalized expected values' variation of the aggregated sub-tasks' estimation compared to the first one is", formatCI(obs42),"\n")
if (obs42$ci_lower <= 0 & obs42$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in expected values between the two estimates.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in expected values between the two estimates.\n")
if (obs42$estimate < 0) {
cat("More precisely, more optimistic estimations are observed for the aggregated sub-tasks' estimate, which is not in line with the literature.\n")
} else {
cat("More precisely, less optimistic estimations are observed for the aggregated sub-tasks' estimate, in line with the literature.\n")
}
}
}
## The average normalized expected values' variation of the aggregated sub-tasks' estimation compared to the first one is -0.41, 95% CI [-0.5, -0.29]
## The data analysis suggests there is a noticeable difference in expected values between the two estimates.
## More precisely, more optimistic estimations are observed for the aggregated sub-tasks' estimate, which is not in line with the literature.
if (length(data2$perc_diff_st1[data2$perc_diff_st1 != 0]) != 0) {
obs42_verif = diffMeanCI.bootstrap(perc_diff_st1,data2$perc_diff_st1,conf.level = lvl_conf, seed = seed2)
if (!(obs42_verif$ci_lower <= 0 & obs42_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs42_verif),"\n")
}
}
Then, regarding the second estimation: Is there a trend towards becoming more or less optimistic? If the second estimation is less optimistic, meaning that it might tend towards overestimation, then it might be related to the sub-tasks’ decomposition, depending on whether they have proven to be consistent with the literature.
if (length(perc_diff_21[perc_diff_21 != 0]) == 0) {
cat("All expected values provided are similar across all participants (no difference between the expected values of the first and second estimates).\n")
} else {
obs42bis = meanCI.bootstrap(perc_diff_21, conf.level = lvl_conf, seed = seed2)
cat("The average normalized expected values' variation of the second estimation compared to the first one is", formatCI(obs42bis),"\n")
if (obs42bis$ci_lower <= 0 & obs42bis$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in expected values between the two first estimates.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in expected values between the two first estimates.\n")
if (obs42bis$estimate < 0) {
cat("More precisely, more optimistic estimations are observed.")
} else {
cat("More precisely, less optimistic estimations are observed.")
}
}
}
## The average normalized expected values' variation of the second estimation compared to the first one is 0.2, 95% CI [0.11, 0.38]
## The data analysis suggests there is a noticeable difference in expected values between the two first estimates.
## More precisely, less optimistic estimations are observed.
if (length(data2$perc_diff_21[data2$perc_diff_21 != 0]) != 0) {
obs42bis_verif = diffMeanCI.bootstrap(perc_diff_21,data2$perc_diff_21,conf.level = lvl_conf, seed = seed2)
if (!(obs42bis_verif$ci_lower <= 0 & obs42bis_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs42bis_verif),"\n")
}
}
- Still the estimate might also differ in width assessing the amount of uncertainty one puts in their answer, hence the question: Do the intervals become wider? Thereby, larger intervals might suggest participants acknowledged the uncertainty underlying such estimation.
if (length(perc_diff_21_gap[perc_diff_21_gap != 0]) == 0) {
cat(" All the intervals provided are similar across all participants (no difference between the intervals of the first and second estimates).\n")
} else {
obs43 = meanCI.bootstrap(perc_diff_21_gap, conf.level = lvl_conf, seed = seed2)
cat("The average normalized interval variation of the second estimation compared to the first one is", formatCI(obs43),"\n")
if (obs43$ci_lower <= 0 & obs43$ci_upper >= 0) {
cat("The data analysis does not allow us to say there is a clear difference in interval's width between the two first estimates.\n")
} else {
cat("The data analysis suggests there is a noticeable difference in interval's width between the two first estimates.\n")
if (obs43$estimate < 0) {
cat("More precisely, a decrease in interval's size is observed.")
} else {
cat("More precisely, an increase in interval's size is observed.")
}
}
}
## The average normalized interval variation of the second estimation compared to the first one is 0.38, 95% CI [0.26, 0.55]
## The data analysis suggests there is a noticeable difference in interval's width between the two first estimates.
## More precisely, an increase in interval's size is observed.
if (length(data2$perc_diff_21_gap[data2$perc_diff_21_gap != 0]) != 0) {
obs43_verif = diffMeanCI.bootstrap(perc_diff_21_gap,data2$perc_diff_21_gap,conf.level = lvl_conf, seed = seed2)
if (!(obs43_verif$ci_lower <= 0 & obs43_verif$ci_upper >= 0)) {
cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
cat("Difference between the two measures:", formatCI(obs43_verif),"\n")
}
}
Personal preferences
At the end of the study, the participants were asked whether they believed they were rather early or late on a 7-point Likert scale. We wish to use this data to observe whether it could help predict their preferences. We test this using a Kendall Tau correlation coefficient and test it for both the initial waiting time and the final chosen one.
Do people who tend to be early prefer longer waiting time? (based on the initially selected departure time)
if (length(waiting_time2[lateness_rating > 0]) == 0) {
cat("No participants reported they were quite, sometimes or even often late.\n")
} else if (length(waiting_time2[lateness_rating < 0]) == 0) {
cat("No participants reported they were quite, sometimes or even often early.\n")
} else if(length(waiting_time2[lateness_rating > 0 & waiting_time != -Inf]) == 0) {
cat("All participants who were self-reported as rather late were irrational.\n")
} else if(length(waiting_time2[lateness_rating < 0 & waiting_time != -Inf]) == 0) {
cat("All participants who were self-reported as rather early were irrational.\n")
} else {
obs51 = kendall.corCI.bootstrap(lateness_rating, waiting_time,conf.level = lvl_conf, seed = seed2)
cat("The Kendall's tau correlation coefficient for the waiting time and lateness ratings is", formatCI(obs51),"\n")
if (obs51$ci_lower <= 0 & obs51$ci_upper >= 0) {
cat("The data analysis does not allow us to say the preferred waiting time and the lateness rating are correlated.\n")
} else if (obs51$ci_lower <= 1 & obs51$ci_upper >= 1) {
cat("The data analysis suggests the preferred waiting time and the lateness rating are correlated (same).\n")
} else if (obs51$ci_lower <= -1 & obs51$ci_upper >= -1) {
cat("The data analysis suggests the preferred waiting time and the lateness rating are correlated (reverse).\n")
} else {
cat("The data analysis does not allow to conclude anything.\n")
}
}
## The Kendall's tau correlation coefficient for the waiting time and lateness ratings is -0.084, 95% CI [-0.22, 0.047]
## The data analysis does not allow us to say the preferred waiting time and the lateness rating are correlated.
Do people who tend to be early prefer longer waiting time? (based on the final selected departure time)
if (length(waiting_time2[lateness_rating > 0]) == 0) {
cat("No participants reported they were quite, sometimes or even often late.\n")
} else if (length(waiting_time2[lateness_rating < 0]) == 0) {
cat("No participants reported they were quite, sometimes or even often early.\n")
} else if(length(waiting_time2[lateness_rating > 0 & waiting_time2 != -Inf]) == 0) {
cat("All participants who were self-reported as rather late were irrational.\n")
} else if(length(waiting_time2[lateness_rating < 0 & waiting_time2 != -Inf]) == 0) {
cat("All participants who were self-reported as rather early were irrational.\n")
} else {
obs51 = kendall.corCI.bootstrap(lateness_rating, waiting_time2,conf.level = lvl_conf, seed = seed2)
cat("The Kendall's tau correlation coefficient for the waiting time and lateness ratings is", formatCI(obs51),"\n")
if (obs51$ci_lower <= 0 & obs51$ci_upper >= 0) {
cat("The data analysis does not allow us to say the preferred waiting time and the lateness rating are correlated.\n")
} else if (obs51$ci_lower <= 1 & obs51$ci_upper >= 1) {
cat("The data analysis suggests the preferred waiting time and the lateness rating are correlated (same).\n")
} else if (obs51$ci_lower <= -1 & obs51$ci_upper >= -1) {
cat("The data analysis suggests the preferred waiting time and the lateness rating are correlated (reverse).\n")
} else {
cat("The data analysis does not allow to conclude anything.\n")
}
}
## The Kendall's tau correlation coefficient for the waiting time and lateness ratings is -0.11, 95% CI [-0.23, 0.014]
## The data analysis does not allow us to say the preferred waiting time and the lateness rating are correlated.