library(stringr)
library(ggpubr)
library(expss)
library(dplyr)
library(tidyr)
source('aux/datagen-help.R')
source('data/generate-data.R')
source('aux/stats-help.R')
source('aux/CI-helpers.R')
source('aux/plots-help.R')

The online study linked to this analysis creates a context in which duration estimates can be retrieved. The main purpose of this experiment is to explore how visualization can improve, assist or otherwise influence time estimations.

1 Research questions

1.1 Main research questions

What is the effect of simulating and visualizing data on people’s task duration estimates?
- Do their point estimates change overall (from first to third estimation)? Which part of the potential increase can be attributed to the visualization (from second to third estimation)?
- Do their uncertainty ranges increase or change overall (from first to third estimation)? Which part can be be attributed to the visualization (from second to third estimation)?
Do quantile dotplots help indicating 95% prediction intervals?
- What is the size of the prediction interval participants indicate after seeing a quantile dotplot?
- What is the effect of the number of dots on the actual size of the prediction interval?
What is the effect of any feedback and of specific forms (slider vs visualization) on people’s decision making?
- What is the overall difference between the first and the final decision on waiting time?
- What is the difference between the first and the final decision on waiting time depending on feedback groups?
- What is the overall difference between the first and the final decision on the likelihood that people would miss their train?
- What is the difference between the first and final decision on the likelihood depending on feedback groups?
- Do people using the visualization take longer to come to a decision than people using the slider?

1.2 Additional questions and analysis

How do our results compare to previous work in psychology: How does breaking a task down into subtasks and considering events slowing one down or making one faster affect people’s estimation?
1. Do central tendencies become more or less optimistic?
2. Do the intervals become wider?
Do people who tend to be early prefer longer waiting time?

1.3 Dependent variables

The dependent variables we measure are:

Task 1 estimates
- E1a and I1a: first estimate expressed as a symmetric interval: duration in minutes + size of interval (1, ∞).
- E1b and I1b: second estimate (after breaking down into subtasks and slow and fast events)
- E1c and I1c: third estimate (after seeing the chart)
- E1d_lower and E1d_upper: each substep’s estimates expressed as a lower and upper bounds
Task 2 estimates
- E2a: first estimate of a departure time, before seeing, exploring data (in minutes before the train leaves) [0, 960]
- E2b: second estimate of a departure time, after seeing the chart or exploring the slider (in minutes before the train leaves) [0, 960]
Interaction log
- Time they interact with the chart or the slider (1, ∞)
- Range of values explored with the slider or chart [0,highest value]
Logs of simulated data used during the experiment
- The log of the first 10,000 (out of 100,000) samples of generated task 1 completion times drawn to compute the quantile dotplot during the experiment.
- The log of the first 10,000 (out of 100,000) samples of generated task 2 completion times drawn to compute the feedback for task 2 during the experiment.
- The likelihood to miss one’s train for each minute shown on the respective feedback mode.

Covariates: lateness measured as a 7-point Likert scale item [-3, 3].

1.4 Outcomes

The effects we are computing and looking at to find answers to our research questions.

Considering the question what is the effect of simulating and visualizing data on people’s task duration estimates, we analyze the following outcomes:
1. We compute percent change differences on the estimates made by participants.
  - From the first to the third estimate: (E1c - E1a) / E1a if around 0, then they stay the same, values > 0 indicate higher estimates (more pessimistic), values < 0 lower estimates (more optimistic). [within outcome]
  - From the second to the third estimate: (E1c - E1b) / E1b if around 0, estimates remain the same, > 0 indicate higher estimates for the third estimation (increased pessimism attributable to the visualization), < 0 lower estimates than before (increased optimism attributable to the visualization).
2. We compute percent change differences on the interval size given by participants.
  - From the first to the third estimate: (I1c - I1a) / I1a if around 0, then they stay the same, values > 0 indicate larger intervals (increased sense of uncertainty after breaking into subtasks and seeing the result visualized), values < 0 smaller intervals (reduced sense of uncertainty). [within outcome]
  - From the second to the third estimate: (I1c - I1b) / I1b if around 0, interval size remain the same, > 0 indicate larger intervals for the third estimation (increased sense of uncertainty attributable to the visualization), < 0 smaller ranges (reduced sense of uncertainty attributable to the visualization).
Considering the question if quantile dotplots help indicating 95% prediction intervals:
- PI: the overall actual size of the prediction interval after seeing any quantile dotplot
- PI_20 and PI_50: same as above but by number of dots participants saw [between outcome]
- The difference between PI_20 and PI_50.
Considering the question what is the effect of type of representation (slider vs visualization) on people’s decision we analyze
- E2b - E2a: the increase in waiting time after seeing some form of feedback
- Same as 2.1 but with the number of dots as a factor
- Lh_feedback - Lh_initial: the increase in the number of trains acceptable to miss after seeing some form of feedback.
- Same as 3.3 but with the number of dots as a factor

Outcomes for the additional analyses

To check in how far our results are consistent with previous literature on the planning fallacy we compute
- (E1b - E1a) / E1a: the percent change increase in estimates after participants were asked to make estimates for sub-tasks and to reflect on the likelihood of events increasing or decreasing the required time.
- (I1b - I1a) / I1a: the percent change increase in the size of the interval after participants were asked to make estimates for sub-tasks and reflect on the likelihood of events increasing or decreasing the required time.

2 Data

2.1 Loading data with outcomes required for our analysis

lvl_conf <- 0.95
seed = 0
seed2 = 0

data = read.csv("data/data.csv")

attach(data)

We plan one type of robustness check: how to interpret the central tendency given by participants. Participants indicate a central tendency and a symmetric interval around it. However, these are durations which are strictly positive and more likely log-normal distributed. As a robustness check we therefore transform the central tendency indicated by participants into the expected value of a log-normal distribution described by the interval given by participants.

For the inferential analysis further down in this document, we compare the results for both options in each analysis step. An output is only produced if the 95% bootstrapped confidence interval of the difference between the two options does not include 0.

data2 = read.csv("data/data2.csv")

2.2 Additional data: time spent on each page

page_data = read.csv("data/page_data.csv")

page_data %>% ggplot(aes(x=page, y = duration, group = participant_id))  + scale_x_continuous(breaks = min(page_data$page):max(page_data$page)) +
  geom_vline(xintercept = 3) +
  geom_vline(xintercept = 10) +
  geom_vline(xintercept = 13) + geom_line(aes(color = feedback_type), alpha = 0.2) + theme_minimal() + theme(panel.grid.minor.x = element_line(), panel.grid.major.x = element_line())

3 Inferential Analysis

3.1 Main research questions

3.1.1 RQ1: What is the effect of simulating and visualizing data on people’s task duration estimates?

We hypothesize the dotplot will help participants make a more informed decision as the duration of a task can be represented with a mixture distribution, which is hard to mentally process. However, our study design does not allow us to establish a ground truth for the accuracy of their estimation and thus, we will only look at the influence (induced variation in estimates) of the dotplot.

Moreover, past research reports that underestimations are more often reported than overestimation (for medium to long tasks), so we are expecting too low values (underestimates) which are then corrected for greater values (less optimistic estimates). Still, the other way around might occur as it remains unclear whether the direction of this prediction bias (over- or underestimation) is fixed for this specific task, and the fact that more studies report underestimations might be due to the field they come from (see Halkjelsvik and Jørgensen (2012)). In other words, even though underestimates are more likely, according to the literature, overestimates would not necessarily mean the participants’ estimates are not erroneous.

Do their point estimates change overall (from first to third estimation)? If so, it would mean the debiasing methods and/or the dotplot induced a change of beliefs.

# The descriptive analysis above shows that we have a few very extreme outliers which may affect our analysis disproportionately. We therefore additionally run the same analysis with data where we filter out all participants who are more than 3 SD from the mean
dataf <- data %>% ungroup() %>%
  mutate(abs_z_perc_31 = abs(scale(perc_diff_31)), abs_z_perc_32 = abs(scale(perc_diff_32)), abs_z_perc_31_gap = abs(scale(perc_diff_31_gap)), abs_z_perc_32_gap = abs(scale(perc_diff_32_gap))) %>%  
  filter(abs_z_perc_31 <= 3, abs_z_perc_32 <= 3 , abs_z_perc_31_gap <= 3, abs_z_perc_32_gap <= 3)

cat("The additional analysis excluding participants whose responses are more than 3 SD from the mean in any of the percent differences for the central tendency or the interval size ignores the data from ", nrow(data) - nrow(dataf), " participants.\n")

## The additional analysis excluding participants whose responses are more than 3 SD from the mean in any of the percent differences for the central tendency or the interval size ignores the data from  7  participants.

# the dataframe to hold the results answering RQ1
df_rq1 = data.frame(name = character(), condition = character(), estimate = numeric(), ci_lower = numeric(), ci_upper = numeric(), order = integer())
if (length(perc_diff_31[perc_diff_31 != 0]) == 0) {
  cat(" All expected values provided are similar across all participants (no difference between the expected values of the first and third estimates).\n")
} else {
  obs11 = meanCI.bootstrap(perc_diff_31, conf.level = lvl_conf, seed = seed2)
  obs11c = meanCI.bootstrap(dataf$perc_diff_31, conf.level = lvl_conf, seed = seed2)
  df_rq1 <- df_rq1 %>% compile.df(obs11, "Increase from initial to final estimate", "all", 1)
  df_rq1 <- df_rq1 %>% compile.df(obs11c, "initial to final (without outliers)", "all", 1.5)
  cat("The average normalized expected values' variation of the third estimation compared to the first one is", formatCI(obs11),"\n")
  if (obs11$ci_lower <= 0 & obs11$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in expected value between the two estimates.\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in expected value between the two estimates.\n")
    if (obs11$estimate < 0) {
      cat("More precisely, more optimistic estimations are observed, suggesting their first estimate was an overestimation.\n")
    } else {
      cat("More precisely, less optimistic estimations are observed, suggesting their first estimate was an underestimation.\n")
    }
  }
}

## The average normalized expected values' variation of the third estimation compared to the first one is 0.7, 95% CI [0.48,   1] 
## The data analysis suggests there is a noticeable difference in expected value between the two estimates.
## More precisely, less optimistic estimations are observed, suggesting their first estimate was an underestimation.

if (length(data2$perc_diff_31[data2$perc_diff_31 != 0]) != 0) {
  obs11_verif = diffMeanCI.bootstrap(perc_diff_31,data2$perc_diff_31,conf.level = lvl_conf, seed = seed2)
  if (!(obs11_verif$ci_lower <= 0 & obs11_verif$ci_upper >= 0)) {
    cat("NOTICE: Whether we interpret participants' estimates as the expected value or the mean influences the data analysis' results.\n")
    cat("The difference between the two measures when using the mean:", formatCI(obs11_verif),"\n")
  }
}

Which part of the potential increase, or change, can be attributed to the visualization (from second to third estimation)? To answer that question, we look at the difference between the second and third estimates.

if (length(perc_diff_32[perc_diff_32 != 0]) == 0) {
  cat(" All expected values provided are similar across all participants (no difference between the expected values of the first and third estimates).\n")
} else {
  obs11bis = meanCI.bootstrap(perc_diff_32, conf.level = lvl_conf, seed = seed2)
  obs11bisc = meanCI.bootstrap(dataf$perc_diff_32, conf.level = lvl_conf, seed = seed2)
  df_rq1 <- df_rq1 %>% compile.df(obs11bis, "Increase due to dotplot", "all", 2)
  df_rq1 <- df_rq1 %>% compile.df(obs11bisc, "due to dotplot (without outliers)", "all", 2.5)
  cat("The average normalized expected values' variation of the third estimation compared to the second one is", formatCI(obs11bis),"\n")
  if (obs11bis$ci_lower <= 0 & obs11bis$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in expected values between the two estimates.\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in expected values between the two estimates.\n")
    if (obs11bis$estimate < 0) {
      cat("More precisely, more optimistic estimations are observed for the third estimation.\n")
    } else {
      cat("More precisely, less optimistic estimations are observed for the third estimation.\n")
    }
  }
}

## The average normalized expected values' variation of the third estimation compared to the second one is 0.83, 95% CI [0.54, 1.5] 
## The data analysis suggests there is a noticeable difference in expected values between the two estimates.
## More precisely, less optimistic estimations are observed for the third estimation.

if (length(data2$perc_diff_32[data2$perc_diff_32 != 0]) != 0) {
  obs11bis_verif = diffMeanCI.bootstrap(perc_diff_32,data2$perc_diff_32,conf.level = lvl_conf, seed = seed2)
  if (!(obs11bis_verif$ci_lower <= 0 & obs11bis_verif$ci_upper >= 0)) {
    cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
    cat("Difference between the two measures:", formatCI(obs11bis_verif),"\n")
  }
}

The mean or median of their estimate is not the only reported variable. We can also analyse the gap, or interval’s width they provided for their estimates. Do their uncertainty ranges increase overall (from first to third estimation)? Indeed, larger interval implies more uncertainty, whereas reduced interval means they are more confident in their estimates (less uncertainty).

if (length(perc_diff_31_gap[perc_diff_31_gap != 0]) == 0) {
  cat(" All interval values provided are similar across all participants (no difference between the interval's length of the first and third estimates).\n")
} else {
  obs12 = meanCI.bootstrap(perc_diff_31_gap, conf.level = lvl_conf, seed = seed2)
  obs12c = meanCI.bootstrap(dataf$perc_diff_31_gap, conf.level = lvl_conf, seed = seed2)
  df_rq1 <- df_rq1 %>% compile.df(obs12, "Interval width increase from first to final estimate", "all", 4)
  df_rq1 <- df_rq1 %>% compile.df(obs12c, "interval from initial to final (without outliers)", "all", 4.5)
  
  cat("The average normalized standard deviation variation of the third estimation compared to the first one is", formatCI(obs12),"\n")
  if (obs12$ci_lower <= 0 & obs12$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in standard deviation between the two estimates.\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in standard deviation between the two estimates.\n")
    if (obs12$estimate < 0) {
      cat("More precisely, less uncertainty is observed in the third estimate.\n")
    } else {
      cat("More precisely, more uncertainty is observed in the third estimate.\n")
    }
  }
}

## The average normalized standard deviation variation of the third estimation compared to the first one is 0.62, 95% CI [0.41, 1.1] 
## The data analysis suggests there is a noticeable difference in standard deviation between the two estimates.
## More precisely, more uncertainty is observed in the third estimate.

if (length(data2$perc_diff_31_gap[data2$perc_diff_31_gap != 0]) != 0) {
  obs12_verif = diffMeanCI.bootstrap(perc_diff_31_gap,data2$perc_diff_31_gap,conf.level = lvl_conf, seed = seed2)
  if (!(obs12_verif$ci_lower <= 0 & obs12_verif$ci_upper >= 0)) {
    cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
    cat("Difference between the two measures:", formatCI(obs12_verif),"\n")
  }
}

Which part can be be attributed to the visualization (from second to third estimation)?

if (length(perc_diff_32_gap[perc_diff_32_gap != 0]) == 0) {
  cat(" All interval values provided are similar across all participants (no difference between the interval's length of the first and third estimates).\n")
} else {
  obs12bis = meanCI.bootstrap(perc_diff_32_gap, conf.level = lvl_conf, seed = seed2)
  obs12bisc = meanCI.bootstrap(dataf$perc_diff_32_gap, conf.level = lvl_conf, seed = seed2)
  df_rq1 <- df_rq1 %>% compile.df(obs12bis, "Interval width increase due to dotplot", "all", 5)
  df_rq1 <- df_rq1 %>% compile.df(obs12bisc, "interval due to dotplot (without outliers)", "all", 5.5)
  
  cat("The average normalized standard deviation variation of the third estimation compared to the second one is", formatCI(obs12bis),"\n")
  if (obs12bis$ci_lower <= 0 & obs12bis$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in standard deviation between the two estimates.\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in standard deviation between the two estimates.\n")
    if (obs12bis$estimate < 0) {
      cat("More precisely, less uncertainty is observed in the third estimate.\n")
    } else {
      cat("More precisely, more uncertainty is observed in the third estimate.\n")
    }
  }
}

## The average normalized standard deviation variation of the third estimation compared to the second one is 0.4, 95% CI [0.16, 0.9] 
## The data analysis suggests there is a noticeable difference in standard deviation between the two estimates.
## More precisely, more uncertainty is observed in the third estimate.

if (length(data2$perc_diff_32_gap[data2$perc_diff_32_gap != 0]) != 0) {
  obs12bis_verif = diffMeanCI.bootstrap(perc_diff_32_gap,data2$perc_diff_32_gap,conf.level = lvl_conf, seed = seed2)
  if (!(obs12bis_verif$ci_lower <= 0 & obs12bis_verif$ci_upper >= 0)) {
    cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
    cat("Difference between the two measures:", formatCI(obs12bis_verif),"\n")
  }
}

An overview visualization reporting our results for research question 1: What is the effect of simulating and visualizing data on people’s task duration estimates?

# data frame holding the raw data for the histograms
df_rq1_raw <- dataf %>% select(c(participant_id, perc_diff_31, perc_diff_32, perc_diff_31_gap, perc_diff_32_gap)) %>% tidyr::pivot_longer(cols = c(perc_diff_31, perc_diff_32, perc_diff_31_gap, perc_diff_32_gap), names_prefix = "perc_diff", names_to = "difference")


df_rq1 %>% ggplot(aes(x = estimate, xmin = ci_lower, xmax = ci_upper, y = reorder(name, -order))) +
  geom_pointrange() + scale_x_continuous(labels = scales::percent) + 
  geom_vline(xintercept = 0) + 
  labs(title = "Effect of simulating and visualizing data on people's\ntask duration estimates", subtitle = "Percent change differences from initial estimates to after seeing a visualization", x = "Percent difference between estimates", y = NULL, caption = "All error bars shows 95% bootstrapped confidence intervals") + 
  theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9))

ggsave("RQ1CI.pdf", height = 3)

## Saving 7 x 3 in image

df_rq1_raw %>% ggplot() + geom_histogram(aes(x = value), binwidth = 0.1) + scale_x_continuous(labels = scales::percent, breaks = c(0, 1, 2,  3, 4)) + facet_wrap(~difference) + theme_void() + theme(axis.text.x = element_text(),  axis.ticks.x = element_line(size = 1), axis.ticks.length.x = unit(1, "mm"))

ggsave("RQ1Hist.pdf", height = 3)

## Saving 7 x 3 in image

3.1.2 RQ2: Do quantile dotplots help indicating 95% prediction intervals?

There were two conditions for the dotplot which made the number of dots (either 20 or 50) vary.

A 95% prediction interval was required throughout the whole study. As the dot plot allows participants to count the dots and thus mentally compute a more precise estimation based on that instruction, their ability to understand and process the information presented to them can be investigated. Indeed, what is the size of the prediction interval people indicate after seeing a quantile dotplot? We expect to find a value around 95% but it might depend on whether this percentage is calculated based on the whole distribution, or what they see on the dotplot.

If this percentage coverage is based on the whole distribution:

# the dataframe to hold the results answering RQ2
df_rq2 = data.frame(name = character(), condition = character(), estimate = numeric(), ci_lower = numeric(), ci_upper = numeric(), order = integer())

obs21 = meanCI.bootstrap(coverage3, conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21, "Final estimate", "all", 1)

cat("The mean actual size of prediction interval of the third estimate is", formatCI(obs21,unit="%"),"\n")

## The mean actual size of prediction interval of the third estimate is  71%, 95% CI [ 66%,  75%]

if (obs21$ci_lower <= 95 & obs21$ci_upper >= 95) {
  cat("The data analysis suggests this is close enough to a 95% CI, as was explicitly requested.\n")
} else {
  cat("The data analysis suggests participants did not provide an estimation following a 95% CI.\n")
}

## The data analysis suggests participants did not provide an estimation following a 95% CI.

obs21_verif = diffMeanCI.bootstrap(coverage3,data2$coverage3,conf.level = lvl_conf, seed = seed2)
if (!(obs21_verif$ci_lower <= 0 & obs21_verif$ci_upper >= 0)) {
  cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
  cat("Difference between the two measures:", formatCI(obs21_verif,unit="%"),"\n")
}

If it is based on the dot-plots’ information:

obs21bis = meanCI.bootstrap(coverage_dotplot, conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21bis, "Final estimate (inferred from q-dotplot)", "all", 2)
cat("The mean actual size of prediction interval of the third estimate is", formatCI(obs21bis,unit="%"),"\n")

## The mean actual size of prediction interval of the third estimate is  72%, 95% CI [ 66%,  77%]

if (obs21bis$ci_lower <= 95 & obs21bis$ci_upper >= 95) {
  cat("The data analysis suggests this is close enough to a 95% CI, as was explicitly requested.\n")
} else {
  cat("The data analysis suggests participants did not provide an estimation following a 95% CI.\n")
}

## The data analysis suggests participants did not provide an estimation following a 95% CI.

obs21bis_verif = diffMeanCI.bootstrap(coverage_dotplot,data2$coverage_dotplot,conf.level = lvl_conf, seed = seed2)
if (!(obs21bis_verif$ci_lower <= 0 & obs21bis_verif$ci_upper >= 0)) {
  cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
  cat("Difference between the two measures:", formatCI(obs21bis_verif,unit="%"),"\n")
}

But in order to be able to account the dot-plot for an effect, we must ask: is it really different than the previous estimate? Yet, as it would not make sense to compare it based on the dot-plot’s information, only the samples from the computed probability distribution are used.

The second estimation’s coverage as a percentage, based on the whole distribution:

obs21ter = meanCI.bootstrap(coverage2, conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21ter, "Second estimate (after reflection)", "all", 3)
cat("The mean actual size of prediction interval of the second estimate is", formatCI(obs21ter,unit="%"),"\n")

## The mean actual size of prediction interval of the second estimate is  49%, 95% CI [ 43%,  55%]

obs21ter = meanCI.bootstrap(coverage1, conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21ter, "First estimate", "all", 3)
cat("The mean actual size of prediction interval of the first estimate is", formatCI(obs21ter,unit="%"),"\n")

## The mean actual size of prediction interval of the first estimate is  42%, 95% CI [ 36%,  49%]

if (obs21ter$ci_lower <= 95 & obs21ter$ci_upper >= 95) {
  cat("The data analysis suggests this is close enough to a 95% CI, as was explicitly requested.\n")
} else {
  cat("The data analysis suggests participants did not provide an estimation following a 95% CI.\n")
}

## The data analysis suggests participants did not provide an estimation following a 95% CI.

obs21ter_verif = diffMeanCI.bootstrap(coverage2,data2$coverage2,conf.level = lvl_conf, seed = seed2)
if (!(obs21ter_verif$ci_lower <= 0 & obs21ter_verif$ci_upper >= 0)) {
  cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
  cat("Difference between the two measures:", formatCI(obs21ter_verif,unit="%"),"\n")
}

eff = FALSE
obs21quater = meanCI.bootstrap(coverage3 - coverage2,conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs21quater, "Net difference attributable to q-dotplot", "difference", 4)
cat("The average difference in actual sizes of prediction intervals between the second and third estimates is", formatCI(obs21quater,unit="%"),"\n")

## The average difference in actual sizes of prediction intervals between the second and third estimates is  22%, 95% CI [ 16%,  28%]

if (obs21quater$ci_lower <= 0 & obs21quater$ci_upper >= 0) {
  cat("The data analysis does not allow us to say there is a clear difference in actual size of prediction interval between the two estimates.\n")
} else {
  eff = TRUE
  cat("The data analysis suggests there is a noticeable difference in actual size of prediction interval between the two estimates.\n")
  if(obs21quater$estimate < 0) {
    cat("More precisely, the second estimation was closer to a 95% coverage size.\n")
  } else {
    cat("More precisely, the third estimation was closer to a 95% coverage size.\n")
  }
}

## The data analysis suggests there is a noticeable difference in actual size of prediction interval between the two estimates.
## More precisely, the third estimation was closer to a 95% coverage size.

obs21quater_verif = diffMeanCI.bootstrap(data2$coverage2,data2$coverage3,conf.level = lvl_conf, seed = seed2)
if (((obs21quater_verif$ci_lower <= 0 & obs21quater_verif$ci_upper >= 0) & eff) | (!(obs21quater_verif$ci_lower <= 0 & obs21quater_verif$ci_upper >= 0) & !eff)){
  cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
  cat("The alternative measure gave an average difference of :", formatCI(obs21quater_verif,unit="%"),"\n")
}

What is the effect of the number of dots on the actual size of the prediction interval? The 50-dot plot carries more information as outliers have a greater chance to appear. Thereby, we want to know whether the number of dots influences the size of the prediction interval, or in other words, if more dots lead to prediction interval covering more instances.

obs22_20 = meanCI.bootstrap(coverage_dotplot[num_dots == 20],conf.level = lvl_conf, seed = seed2)
obs22_50 = meanCI.bootstrap(coverage_dotplot[num_dots == 50],conf.level = lvl_conf, seed = seed2)
cat("The average coverage percentage in the 20-dot condition is", formatCI(obs22_20,unit="%"),"\n")

## The average coverage percentage in the 20-dot condition is  73%, 95% CI [ 66%,  79%]

cat("The average coverage percentage in the 50-dot condition is", formatCI(obs22_50,unit="%"),"\n")

## The average coverage percentage in the 50-dot condition is  70%, 95% CI [ 62%,  77%]

df_rq2 <- df_rq2 %>% compile.df(obs22_20, "When seeing a 20-dot q-dotplot", "20", 5)
df_rq2 <- df_rq2 %>% compile.df(obs22_50, "When seeing a 50-dot q-dotplot", "50", 6)
eff = FALSE
obs22 = diffMeanCI.bootstrap(coverage_dotplot[num_dots == 50],coverage_dotplot[num_dots == 20], conf.level = lvl_conf, seed = seed2)
df_rq2 <- df_rq2 %>% compile.df(obs22, "Difference between 20 and 50 dots q-dotplot", "difference", 7)
cat("The difference in actual size of the prediction interval between the 20- and 50-dot conditions is", formatCI(obs22),"\n")

## The difference in actual size of the prediction interval between the 20- and 50-dot conditions is -2.3, 95% CI [-12, 7.6]

if (obs22$ci_lower <= 0 & obs21$ci_upper >= 0) {
  cat("The data analysis does not allow us to say there is a clear difference in actual size of prediction interval between the two condition.\n")
} else {
  cat("The data analysis suggests there is a noticeable difference in actual size of prediction interval between the two estimates.\n")
  if (obs22$estimate < 0){
    cat("More precisely, the 50-dot group provided larger prediction intervals (covering more instances).\n")
  } else {
    cat("More precisely, the 20-dot group provided larger prediction intervals (covering more instances).\n")
  }
}

## The data analysis does not allow us to say there is a clear difference in actual size of prediction interval between the two condition.

obs22_verif = diffMeanCI.bootstrap(data2$coverage_dotplot[data2$num_dots == 20],data2$coverage_dotplot[data2$num_dots == 50],conf.level = lvl_conf, seed = seed2)
if (((obs22_verif$ci_lower <= 0 & obs22_verif$ci_upper >= 0) & eff) | (!(obs22_verif$ci_lower <= 0 & obs22_verif$ci_upper >= 0) & !eff)){
  cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
  cat("The alternative measure gave an average difference of :", formatCI(obs22_verif,unit="%"),"\n")
}

An overview visualization reporting our results for research question 1: What is the effect of simulating and visualizing data on people’s task duration estimates?

df_rq2 %>% ggplot(aes(x = estimate/100, xmin = ci_lower/100, xmax = ci_upper/100, y = reorder(name, -order))) +
  geom_pointrange() + scale_x_continuous(labels = scales::percent) + 
  geom_vline(xintercept = 0) +   geom_vline(xintercept = 0.95, color = "red") + annotate("text", label = "targeted size", x = 0.85:0.95, 7.5, color = "red", size = 3) +
  labs(title = "Effect of quantile dotplots on the size of elicited prediction intervals", subtitle = "How much of the simulated data is covered by the prediction intervals?", x = "Prediction interval size", y = NULL, caption = "All error bars shows 95% bootstrapped confidence intervals") + 
  theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9), axis.title.x = element_text(size = 9))

ggsave("RQ2CI.pdf", height = 5, width = 5)

df_rq2_raw <- data %>% select(c(participant_id, coverage1, coverage_dotplot, coverage2, coverage3)) %>% tidyr::pivot_longer(cols = c(coverage1, coverage_dotplot, coverage2, coverage3), names_to = "coverage")

d20 <- data %>% filter(num_dots == 20)
d50 <- data %>% filter(num_dots == 50)
df_rq2_raw <- rbind(data.frame(participant_id = d20$participant_id, coverage = rep("coverage_20", nrow(d20)), value = d20$coverage3 ), df_rq2_raw)
df_rq2_raw <- rbind(data.frame(participant_id = d50$participant_id, coverage = rep("coverage_50", nrow(d50)), value = d50$coverage3 ), df_rq2_raw)


rq2hist1 = df_rq2_raw %>% ggplot() + geom_histogram(aes(x = value), binwidth = 2.5) + scale_x_continuous(labels = scales::percent, breaks = seq(0, 100, by=12.5)) + facet_wrap(~coverage) + theme_void() + theme(axis.text.x = element_text(),  axis.ticks.x = element_line(size = 1), axis.ticks.length.x = unit(1, "mm")) 

ggsave("RQ2Histfull.pdf",  height = 3)

## Saving 7 x 3 in image

3.1.3 RQ3: What is the effect of type of representation (slider vs visualization) on people’s decision?

What is the difference between the first and the final decision on the likelihood that people would miss their train? The first decision is the participants’ best guess, but they should rectify their decision for their preferred likelihood of missing a train. As we do not wish to make a difference between participants that are too optimistic and those who are too pessimistic, we use the unsigned difference of likelihood.

# Data frame to hold all results to answer RQ3
df_rq3 = data.frame(name = character(), condition = character(), estimate = numeric(), ci_lower = numeric(), ci_upper = numeric(), order = integer())


if (nrow(data) == 0){
  cat("All participants were irrational.\n")
} else if (nrow(data) == 1) {
  cat("Only 1 participant was not irrational, which is not enough for an analysis.\n")
} else {
  obs31 = meanCI.bootstrap(abs_change_trains_missed, conf.level = lvl_conf, seed = seed2)
  obs31_signed = meanCI.bootstrap(trains_missed2 - trains_missed , conf.level = lvl_conf, seed = seed2)
  obs31_vis = meanCI.bootstrap(abs_change_trains_missed[feedback_type == "visualization"], conf.level = lvl_conf, seed = seed2)
  obs31_slider = meanCI.bootstrap(abs_change_trains_missed[feedback_type == "slider"], conf.level = lvl_conf, seed = seed2)
  obs31_diff = diffMeanCI.bootstrap(abs_change_trains_missed[feedback_type == "visualization"], abs_change_trains_missed[feedback_type == "slider"], conf.level = lvl_conf, seed = seed2)
  df_rq3 <- df_rq3 %>% compile.df(obs31_signed, "Signed difference after seeing any feedback", "all", 1)
  df_rq3 <- df_rq3 %>% compile.df(obs31, "Absolute difference after seeing any feedback", "all", 2)
  df_rq3 <- df_rq3 %>% compile.df(obs31_vis, " --- visualization feedback", "visualization", 3)
  df_rq3 <- df_rq3 %>% compile.df(obs31_slider, " --- slider feedback", "slider", 4)
  df_rq3 <- df_rq3 %>% compile.df(obs31_diff, "Difference (visualization - slider)", "difference", 5)
  
  cat("The difference in likelihood of missing a train between first and final decision is", formatCI(obs31),"\n")
  if (obs31$ci_lower <= 0 & obs31$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in likelihood between the two decisions\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in likelihood between the two decisions\n")
  }
    

}

## The difference in likelihood of missing a train between first and final decision is  15, 95% CI [ 12,  20] 
## The data analysis suggests there is a noticeable difference in likelihood between the two decisions

What is the difference between the first and the final decision on waiting time? As previously, we use the absolute difference between the first and final waiting time which was selected.

if (nrow(data) == 0){
  cat("All participants were irrational.\n")
} else if (nrow(data) == 1) {
  cat("Only 1 participant was not irrational, which is not enough for an analysis.\n")
} else {
  obs32 = meanCI.bootstrap(abs_change_waiting_time, conf.level = lvl_conf, seed = seed2)
  obs32_signed = meanCI.bootstrap(waiting_time2 - waiting_time, conf.level = lvl_conf, seed = seed2)
  obs32_vis = meanCI.bootstrap(abs_change_waiting_time[feedback_type == "visualization"], conf.level = lvl_conf, seed = seed2)
  obs32_slider = meanCI.bootstrap(abs_change_waiting_time[feedback_type == "slider"], conf.level = lvl_conf, seed = seed2)
  obs32_diff = diffMeanCI.bootstrap(abs_change_waiting_time[feedback_type == "visualization"], abs_change_waiting_time[feedback_type == "slider"], conf.level = lvl_conf, seed = seed2)
  df_rq3 <- df_rq3 %>% compile.df(obs32, "Absolute difference after seeing any feedback", "all", 6)
  df_rq3 <- df_rq3 %>% compile.df(obs32_signed, "Signed difference after seeing any feedback", "all", 7)
  df_rq3 <- df_rq3 %>% compile.df(obs32_vis, " --- visualization feedback", "visualization", 8)
  df_rq3 <- df_rq3 %>% compile.df(obs32_slider, " --- slider feedback", "slider", 9)
  df_rq3 <- df_rq3 %>% compile.df(obs32_diff, "Difference (visualization - slider)", "difference", 10)
  cat("The difference in waiting time between first and final decision is", formatCI(obs32),"\n")
  if (obs32$ci_lower <= 0 & obs32$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in waiting time between the two decisions\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in waiting time between the two decisions\n")
  }


}

## The difference in waiting time between first and final decision is  17, 95% CI [ 13,  22] 
## The data analysis suggests there is a noticeable difference in waiting time between the two decisions

# add a column to our df to be able to tell if we're looking at waiting time or trains missed
df_rq3$type = ifelse(df_rq3$order < 6, "trains missed", "waiting time")

An overview visualization reporting our results for research question 1: What is the effect of simulating and visualizing data on people’s task duration estimates?

# Plotting data using a facet_wrap only makes sense if the data ranges are similar for the final results data. If they have considerably different ranges (which may happen since it's not the same unit <number of trains missed versus waiting time>), then we will plot these as two separate charts which we arrange side by side
df_rq3 %>% ggplot(aes(x = estimate, xmin = ci_lower, xmax = ci_upper, y = reorder(name, -order), group = type)) +
  geom_pointrange() + scale_x_continuous() + facet_wrap(~ type) +
  geom_vline(xintercept = 0)  +
  labs(title = "Effect on the decision when to leave for a train after receiving\nfeedback based on simulation data", subtitle = "How much do estimates change and how much does feedback type matter?", x = "", y = NULL, caption = "All error bars shows 95% bootstrapped confidence intervals") + 
  theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9), axis.title.x = element_text(size = 9))

ggsave("RQ3CI.pdf", height = 3)

## Saving 7 x 3 in image

df_rq3_raw <- data %>% group_by(participant_id) %>% transmute(abs_change_waiting_time = abs_change_waiting_time, abs_change_trains_missed = abs_change_trains_missed, change_trains_missed = trains_missed2 - trains_missed, change_waiting_time = waiting_time2 - waiting_time) %>%
  pivot_longer(cols=c(abs_change_waiting_time, abs_change_trains_missed, change_trains_missed, change_waiting_time), names_to = "measure")

dvis <- data %>% filter(feedback_type == "visualization")
dslider <- data %>% filter(feedback_type == "slider")

df_rq3_raw <- rbind(df_rq3_raw, data.frame(participant_id = dvis$participant_id, measure = rep("abs_waiting_vis", nrow(dvis)), value = dvis$abs_change_waiting_time))
df_rq3_raw <- rbind(df_rq3_raw, data.frame(participant_id = dslider$participant_id, measure = rep("abs_waiting_slider", nrow(dslider)), value = dslider$abs_change_waiting_time))
df_rq3_raw <- rbind(df_rq3_raw, data.frame(participant_id = dvis$participant_id, measure = rep("abs_trains_vis", nrow(dvis)), value = dvis$abs_change_trains_missed))
df_rq3_raw <- rbind(df_rq3_raw, data.frame(participant_id = dslider$participant_id, measure = rep("abs_trains_slider", nrow(dslider)), value = dslider$abs_change_trains_missed))


df_rq3_raw %>% ggplot() + geom_histogram(aes(x = value), binwidth = 1) + scale_x_continuous(breaks = seq(-100, 100, by=50)) + facet_wrap(~measure) + theme_void() + theme(axis.text.x = element_text(),  axis.ticks.x = element_line(size = 1), axis.ticks.length.x = unit(1, "mm"))

ggsave("RQ3Hist.pdf", height = 3)

## Saving 7 x 3 in image

Do people using the visualization take longer to come to a decision than people using the slider? We hypothesized a better visualization would ease the decision-making process and thus that the linechart would allow participants to select their preferred departure time quicker than those who used the slider.

df_rq3bis = data.frame(name = character(), condition = character(), estimate = numeric(), ci_lower = numeric(), ci_upper = numeric(), order = integer())

p13.vis = page_data %>% filter(page == 13, feedback_type == 'visualization')
p13.slider = page_data %>% filter(page == 13, feedback_type == 'slider')

obs33 = diffGeomMeanCI.bootstrap(p13.vis$duration,p13.slider$duration, conf.level = lvl_conf, seed = seed2)
obs33_vis = geomMeanCI.bootstrap(p13.vis$duration, conf.level = lvl_conf, seed = seed2)
obs33_slider = geomMeanCI.bootstrap(p13.slider$duration, conf.level = lvl_conf, seed = seed2)

obs33bis = diffGeomMeanCI.bootstrap(time_spent_interacting[feedback_type == 'visualization'], time_spent_interacting[feedback_type == 'slider'], conf.level = lvl_conf, seed = seed2)
obs33bis_vis = geomMeanCI.bootstrap(time_spent_interacting[feedback_type == 'visualization'], conf.level = lvl_conf, seed = seed2)
obs33bis_slider = geomMeanCI.bootstrap(time_spent_interacting[feedback_type == 'slider'], conf.level = lvl_conf, seed = seed2)

df_rq3bis <- df_rq3bis %>% compile.df(obs33_vis, "Visualization", "visualization", 1)
df_rq3bis <- df_rq3bis %>% compile.df(obs33_slider, "Slider", "slider", 2)
df_rq3bis <- df_rq3bis %>% compile.df(obs33, "Difference", "difference", 3)
df_rq3bis <- df_rq3bis %>% compile.df(obs33bis_vis, "Visualization", "visualization", 4)
df_rq3bis <- df_rq3bis %>% compile.df(obs33bis_slider, "Slider", "slider", 5)
df_rq3bis <- df_rq3bis %>% compile.df(obs33bis, "Difference", "difference", 6)
cat("The difference in time taken to make a decision between the feedback-type conditions is", formatCI(obs33,unit="s"),"\n")

## The difference in time taken to make a decision between the feedback-type conditions is  43s, 95% CI [ 26s,  60s]

if (obs33$ci_lower <= 0 & obs33$ci_upper >= 0) {
  cat("The data analysis does not allow us to say there is a clear difference in time take to make a decision between the two conditions.\n")
} else {
  cat("The data analysis suggests there is a noticeable difference in time taken to make a decision between the two conditions.\n")
  if (obs33$estimate < 0){
    cat("More precisely, the slider group took longer.\n")
  } else {
    cat("More precisely, the linechart group took longer.\n")
  }
}

## The data analysis suggests there is a noticeable difference in time taken to make a decision between the two conditions.
## More precisely, the linechart group took longer.

df_rq3bis$type <- ifelse(df_rq3bis$order < 4, "time on page", "time interacting")

An overview visualization reporting our results for research question 3bis: What is the effect of simulating and visualizing data on people’s task duration estimates?

# Plotting data using a facet_wrap only makes sense if the data ranges are similar for the final results data. If they have considerably different ranges (which may happen since it's not the same unit <number of trains missed versus waiting time>), then we will plot these as two separate charts which we arrange side by side
df_rq3bis %>% filter(type == "time on page") %>% ggplot(aes(x = estimate, xmin = ci_lower, xmax = ci_upper, y = reorder(name, -order))) +
  geom_pointrange() + scale_x_continuous() +
  geom_vline(xintercept = 0)  +
  labs(title = "Time spent on the feedback", subtitle = "", x = "Time (s)", y = NULL, caption = "All point estimates are geometric means and all error bars shows 95% bootstrapped confidence intervals") + 
  theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9), axis.title.x = element_text(size = 9))

df_rq3bis %>% filter(type == "time interacting") %>% ggplot(aes(x = estimate, xmin = ci_lower, xmax = ci_upper, y = reorder(name, -order))) +
  geom_pointrange() + scale_x_continuous() +
  geom_vline(xintercept = 0)  +
  labs(title = "Time spent interacting with the presented feedback option", subtitle = "", x = "Time (s)", y = NULL, caption = "All point estimates are geometric means and all error bars shows 95% bootstrapped confidence intervals") + 
  theme_minimal() + theme(plot.title = element_text(hjust = 0.5, size = 11), plot.subtitle = element_text(hjust = 0.5, size = 9), axis.title.x = element_text(size = 9))

3.2 Additional questions

3.2.1 Debiasing methods

How do our results compare to previous work in psychology? How does breaking a task down into subtasks and considering events slowing one down or making one faster affect people’s estimation?

Estimating the duration of smaller components of a whole task produces greater values, which are not necessarily more accurate. This result has been observed by Connolly and Dean (1997) with a software development task, and by Forsyth and Burt (2008). Hayes-Roth (1980) also investigated this debiasing method but this time with a task involving similar errands. More precisely, the given sub-steps were enter store, find desired object, wait in line, pay for object and leave store. Based on these studies’ findings, we expect the participants to provide greater aggregated estimates when focusing on each sub-components than when predicting for the whole task. Nonetheless, the effect it might induce on the second estimation is uncertain for it depends on the participants’ ability and willingness to take the previous information into account.

The selection of the second debiasing method used for this experiment was mainly motivated by the information on the participants’ beliefs it would provide. Byram (1997) conducted an experiment in which surprises were manipulated but the results were inconclusive.

How similar remain people’s responses? If the two first estimations of the whole task’s duration are quite similar in distribution, then it means the two debiasing methods had no effect. However, if there is a difference, we cannot precisely tell which factor caused it.

if (length(perc_diff_21[perc_diff_21 != 0]) == 0) {
  cat(" All expected values provided are similar across all participants (no difference between the first and second estimate).\n")
} else {
  obs41 = meanCI.bootstrap(abs(perc_diff_21), conf.level = lvl_conf, seed = seed2)
  obs41sd = meanCI.bootstrap(abs(perc_diff_21), conf.level = lvl_conf, seed = seed2)
  
  cat("The absolute average normalized difference between the second estimation and the first one is", formatCI(obs41),"\n")
  if (obs41$ci_lower <= 0 & obs41$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference between the two first estimates.\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in expected values between the two first estimates.\n")
  }
}

## The absolute average normalized difference between the second estimation and the first one is 0.28, 95% CI [0.2, 0.47] 
## The data analysis suggests there is a noticeable difference in expected values between the two first estimates.

if (length(data2$perc_diff_21[data2$perc_diff_21 != 0]) != 0) {
  obs41_verif = diffMeanCI.bootstrap(abs(perc_diff_21),abs(data2$perc_diff_21),conf.level = lvl_conf, seed = seed2)
  if (!(obs41_verif$ci_lower <= 0 & obs41_verif$ci_upper >= 0)) {
    cat("NOTICE: Whether we treat participants'responses as central tendency or compute the mean influences the data analysis' results.\n")
    cat("Difference between the two measures:", formatCI(obs41_verif),"\n")
  }
}

Before looking at the direction of correction between the first and second estimates, we can examine the sub-tasks’ estimates that were made, and more precisely: Are the sub-tasks’ estimates consistent with the literature? If so, then the expected value of the aggregated estimation of subtasks’ duration will be greater than the expected value of the first estimation.

if (length(perc_diff_st1[perc_diff_st1 != 0]) == 0) {
  cat(" All expected values provided are similar across all participants (no difference between expected values of the first and aggregated sub-tasks' estimates).\n")
} else {
  obs42 = meanCI.bootstrap(perc_diff_st1, conf.level = lvl_conf, seed = seed2)
  
  cat("The average normalized expected values' variation of the aggregated sub-tasks' estimation compared to the first one is", formatCI(obs42),"\n")
  if (obs42$ci_lower <= 0 & obs42$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in expected values between the two estimates.\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in expected values between the two estimates.\n")
    if (obs42$estimate < 0) {
      cat("More precisely, more optimistic estimations are observed for the aggregated sub-tasks' estimate, which is not in line with the literature.\n")
    } else {
      cat("More precisely, less optimistic estimations are observed for the aggregated sub-tasks' estimate, in line with the literature.\n")
    }
  }
}

## The average normalized expected values' variation of the aggregated sub-tasks' estimation compared to the first one is -0.41, 95% CI [-0.5, -0.29] 
## The data analysis suggests there is a noticeable difference in expected values between the two estimates.
## More precisely, more optimistic estimations are observed for the aggregated sub-tasks' estimate, which is not in line with the literature.

if (length(data2$perc_diff_st1[data2$perc_diff_st1 != 0]) != 0) {
  obs42_verif = diffMeanCI.bootstrap(perc_diff_st1,data2$perc_diff_st1,conf.level = lvl_conf, seed = seed2)
  if (!(obs42_verif$ci_lower <= 0 & obs42_verif$ci_upper >= 0)) {
    cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
    cat("Difference between the two measures:", formatCI(obs42_verif),"\n")
  }
}

Then, regarding the second estimation: Is there a trend towards becoming more or less optimistic? If the second estimation is less optimistic, meaning that it might tend towards overestimation, then it might be related to the sub-tasks’ decomposition, depending on whether they have proven to be consistent with the literature.

if (length(perc_diff_21[perc_diff_21 != 0]) == 0) {
  cat("All expected values provided are similar across all participants (no difference between the expected values of the first and second estimates).\n")
} else {
  obs42bis = meanCI.bootstrap(perc_diff_21, conf.level = lvl_conf, seed = seed2)
  
  cat("The average normalized expected values' variation of the second estimation compared to the first one is", formatCI(obs42bis),"\n")
  if (obs42bis$ci_lower <= 0 & obs42bis$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in expected values between the two first estimates.\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in expected values between the two first estimates.\n")
    if (obs42bis$estimate < 0) {
      cat("More precisely, more optimistic estimations are observed.")
    } else {
      cat("More precisely, less optimistic estimations are observed.")
    }
  }
}

## The average normalized expected values' variation of the second estimation compared to the first one is 0.2, 95% CI [0.11, 0.38] 
## The data analysis suggests there is a noticeable difference in expected values between the two first estimates.
## More precisely, less optimistic estimations are observed.

if (length(data2$perc_diff_21[data2$perc_diff_21 != 0]) != 0) {
  obs42bis_verif = diffMeanCI.bootstrap(perc_diff_21,data2$perc_diff_21,conf.level = lvl_conf, seed = seed2)
  if (!(obs42bis_verif$ci_lower <= 0 & obs42bis_verif$ci_upper >= 0)) {
    cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
    cat("Difference between the two measures:", formatCI(obs42bis_verif),"\n")
  }
}

Still the estimate might also differ in width assessing the amount of uncertainty one puts in their answer, hence the question: Do the intervals become wider? Thereby, larger intervals might suggest participants acknowledged the uncertainty underlying such estimation.

if (length(perc_diff_21_gap[perc_diff_21_gap != 0]) == 0) {
  cat(" All the intervals provided are similar across all participants (no difference between the intervals of the first and second estimates).\n")
} else {
  obs43 = meanCI.bootstrap(perc_diff_21_gap, conf.level = lvl_conf, seed = seed2)
  
  cat("The average normalized interval variation of the second estimation compared to the first one is", formatCI(obs43),"\n")
  if (obs43$ci_lower <= 0 & obs43$ci_upper >= 0) {
    cat("The data analysis does not allow us to say there is a clear difference in interval's width between the two first estimates.\n")
  } else {
    cat("The data analysis suggests there is a noticeable difference in interval's width between the two first estimates.\n")
    if (obs43$estimate < 0) {
      cat("More precisely, a decrease in interval's size is observed.")
    } else {
      cat("More precisely, an increase in interval's size is observed.")
    }
  }
}

## The average normalized interval variation of the second estimation compared to the first one is 0.38, 95% CI [0.26, 0.55] 
## The data analysis suggests there is a noticeable difference in interval's width between the two first estimates.
## More precisely, an increase in interval's size is observed.

if (length(data2$perc_diff_21_gap[data2$perc_diff_21_gap != 0]) != 0) {
  obs43_verif = diffMeanCI.bootstrap(perc_diff_21_gap,data2$perc_diff_21_gap,conf.level = lvl_conf, seed = seed2)
  if (!(obs43_verif$ci_lower <= 0 & obs43_verif$ci_upper >= 0)) {
    cat("NOTICE: Whether the participants provide the median or the mean influences the data analysis' results.\n")
    cat("Difference between the two measures:", formatCI(obs43_verif),"\n")
  }
}

3.2.2 Personal preferences

At the end of the study, the participants were asked whether they believed they were rather early or late on a 7-point Likert scale. We wish to use this data to observe whether it could help predict their preferences. We test this using a Kendall Tau correlation coefficient and test it for both the initial waiting time and the final chosen one.

Do people who tend to be early prefer longer waiting time? (based on the initially selected departure time)

if (length(waiting_time2[lateness_rating > 0]) == 0) {
  cat("No participants reported they were quite, sometimes or even often late.\n")
} else if (length(waiting_time2[lateness_rating < 0]) == 0) {
  cat("No participants reported they were quite, sometimes or even often early.\n")
} else if(length(waiting_time2[lateness_rating > 0 & waiting_time != -Inf]) == 0) {
  cat("All participants who were self-reported as rather late were irrational.\n")
} else if(length(waiting_time2[lateness_rating < 0 & waiting_time != -Inf]) == 0) {
  cat("All participants who were self-reported as rather early were irrational.\n")
} else {
  obs51 = kendall.corCI.bootstrap(lateness_rating, waiting_time,conf.level = lvl_conf, seed = seed2)
  cat("The Kendall's tau correlation coefficient for the waiting time and lateness ratings is", formatCI(obs51),"\n")
  if (obs51$ci_lower <= 0 & obs51$ci_upper >= 0) {
    cat("The data analysis does not allow us to say the preferred waiting time and the lateness rating are correlated.\n")
  } else if (obs51$ci_lower <= 1 & obs51$ci_upper >= 1) {
    cat("The data analysis suggests the preferred waiting time and the lateness rating are correlated (same).\n")
  } else if (obs51$ci_lower <= -1 & obs51$ci_upper >= -1) {
    cat("The data analysis suggests the preferred waiting time and the lateness rating are correlated (reverse).\n")
  } else {
    cat("The data analysis does not allow to conclude anything.\n")
  }
}

## The Kendall's tau correlation coefficient for the waiting time and lateness ratings is -0.084, 95% CI [-0.22, 0.047] 
## The data analysis does not allow us to say the preferred waiting time and the lateness rating are correlated.

Do people who tend to be early prefer longer waiting time? (based on the final selected departure time)

if (length(waiting_time2[lateness_rating > 0]) == 0) {
  cat("No participants reported they were quite, sometimes or even often late.\n")
} else if (length(waiting_time2[lateness_rating < 0]) == 0) {
  cat("No participants reported they were quite, sometimes or even often early.\n")
} else if(length(waiting_time2[lateness_rating > 0 & waiting_time2 != -Inf]) == 0) {
  cat("All participants who were self-reported as rather late were irrational.\n")
} else if(length(waiting_time2[lateness_rating < 0 & waiting_time2 != -Inf]) == 0) {
  cat("All participants who were self-reported as rather early were irrational.\n")
} else {
  obs51 = kendall.corCI.bootstrap(lateness_rating, waiting_time2,conf.level = lvl_conf, seed = seed2)
  cat("The Kendall's tau correlation coefficient for the waiting time and lateness ratings is", formatCI(obs51),"\n")
  if (obs51$ci_lower <= 0 & obs51$ci_upper >= 0) {
    cat("The data analysis does not allow us to say the preferred waiting time and the lateness rating are correlated.\n")
  } else if (obs51$ci_lower <= 1 & obs51$ci_upper >= 1) {
    cat("The data analysis suggests the preferred waiting time and the lateness rating are correlated (same).\n")
  } else if (obs51$ci_lower <= -1 & obs51$ci_upper >= -1) {
    cat("The data analysis suggests the preferred waiting time and the lateness rating are correlated (reverse).\n")
  } else {
    cat("The data analysis does not allow to conclude anything.\n")
  }
}

## The Kendall's tau correlation coefficient for the waiting time and lateness ratings is -0.11, 95% CI [-0.23, 0.014] 
## The data analysis does not allow us to say the preferred waiting time and the lateness rating are correlated.

Analysis plan for ‘Study on estimating durations and probabilities in the context of everyday tasks’

1 Research questions

1.1 Main research questions

1.2 Additional questions and analysis

1.3 Dependent variables

1.4 Outcomes

2 Data

2.1 Loading data with outcomes required for our analysis

2.2 Additional data: time spent on each page

3 Inferential Analysis

3.1 Main research questions

3.1.1 RQ1: What is the effect of simulating and visualizing data on people’s task duration estimates?

3.1.2 RQ2: Do quantile dotplots help indicating 95% prediction intervals?

3.1.3 RQ3: What is the effect of type of representation (slider vs visualization) on people’s decision?

3.2 Additional questions

3.2.1 Debiasing methods

3.2.2 Personal preferences