У меня есть этот набор данных
study_ID title experiment question_ID participant_ID estimate_level estimate correct_answer question type category age gender
<dbl> <chr> <dbl> <chr> <int> <chr> <dbl> <dbl> <chr> <chr> <chr> <int> <chr>
1 11 Dallacker_Parents'_co… 1 1 1 individual 3 10 How many sugar cubes does or… unlim… nutriti… 32 Female
2 11 Dallacker_Parents'_co… 1 2 1 individual 10 11.5 How many sugar cubes does a … unlim… nutriti… 32 Female
3 11 Dallacker_Parents'_co… 1 3 1 individual 7 6.5 How many sugar cubes does a … unlim… nutriti… 32 Female
4 11 Dallacker_Parents'_co… 1 4 1 individual 1 16.5 How many sugar cubes does a … unlim… nutriti… 32 Female
5 11 Dallacker_Parents'_co… 1 5 1 individual 7 11 How many sugar cubes does a … unlim… nutriti… 32 Female
6 11 Dallacker_Parents'_co… 1 6 1 individual 5 2.5 How many sugar cubes does a … unlim… nutriti… 32 Female
7 11 Dallacker_Parents'_co… 1 1 2 individual 2 10 How many sugar cubes does or… unlim… nutriti… 29 Female
8 11 Dallacker_Parents'_co… 1 2 2 individual 10 11.5 How many sugar cubes does a … unlim… nutriti… 29 Female
9 11 Dallacker_Parents'_co… 1 3 2 individual 1.5 6.5 How many sugar cubes does a … unlim… nutriti… 29 Female
10 11 Dallacker_Parents'_co… 1 4 2 individual 2 16.5 How many sugar cubes does a … unlim… nutriti… 29 Female
В этом наборе данных 6 вопросов, каждый из которых имеет столбец correct_answer
и столбец estimate
. Я пытаюсь вычислить величину для каждого вопроса, чтобы получить процент людей, которые недооценили или переоценили и которые оценили правильно.
Например, для каждого из 6 вопросов он вернет что-то вроде этого: 80 процентов занижены, 10 завышены и 10 процентов ответили правильно.
Как я могу это сделать? Я в тупике. Заранее спасибо!
Вот вывод
dput(head(DF, 10))
structure(list(study_ID = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5), title = c("5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd", "5_Jayles_Debiasing_The_Crowd",
"5_Jayles_Debiasing_The_Crowd"), experiment = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1), question_ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
participant_ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), estimate_level = c("individual",
"individual", "individual", "individual", "individual", "individual",
"individual", "individual", "individual", "individual"),
estimate = c(2e+07, 4500000, 21075541, 2e+07, 1e+06, 1.1e+07,
2.5e+07, 8e+06, 1.6e+07, 9800000), correct = c(3.8e+07, 3.8e+07,
3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07, 3.8e+07,
3.8e+07), question = c("What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?",
"What is the population of Tokyo and its agglomeration?"),
type = c("unlimited", "unlimited", "unlimited", "unlimited",
"unlimited", "unlimited", "unlimited", "unlimited", "unlimited",
"unlimited"), category = c("demographics", "demographics",
"demographics", "demographics", "demographics", "demographics",
"demographics", "demographics", "demographics", "demographics"
), age = c("NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA"), gender = c("NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "NA", "NA")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
dput(head(df, 10))
. @dampfy - person ekoam   schedule 01.10.2020