Задание Hive не работает должным образом в кластере cassandra, редуктор застревает

У меня есть кластер cassandra с 6 узлами datastax (3 cassandra- 3 analytics). Я использую улей для создания отчетов.

Проблема в том, что когда я запускаю задание улья с помощью count (*) или группировку по запросу, сопоставители завершаются, но редукторы застревают на определенном проценте. И они никогда не меняются.

(Все узлы имеют 60 ГБ данных. Семейство столбцов, в котором я запускаю куст, имеет около 40 ГБ на каждом узле и имеет 15 столбцов, каждый с размером данных 50,60 символов. Я использую настройки куста по умолчанию.)

Также доступна их информация по времени и размеру данных работы улья (обычный запрос). Я мог найти в сети.

Вот что происходит

Запрос очень простой - выберите количество (*) из таблицы; В нем 300 тыс. Строк.

Starting Job = job_201305210728_0005, Tracking URL = http://something:50030/jobdetails.jsp?jobid=job_201305210728_0005
Kill Command = ./bin/dse hadoop job  -Dmapred.job.tracker=10.70.41.62:8012 -kill job_201305210728_0005
Hadoop job information for Stage-1: number of mappers: 7; number of reducers: 1
2013-05-22 03:52:58,811 Stage-1 map = 0%,  reduce = 0%
2013-05-22 03:53:02,841 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 1.87 sec
2013-05-22 03:53:03,863 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 1.87 sec
2013-05-22 03:53:04,874 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 2.59 sec
2013-05-22 03:53:05,882 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 2.59 sec
2013-05-22 03:53:06,894 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 3.42 sec
2013-05-22 03:53:07,901 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 3.42 sec
2013-05-22 03:53:08,908 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 5.13 sec
2013-05-22 03:53:09,919 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 5.13 sec
2013-05-22 03:53:10,929 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 5.51 sec
2013-05-22 03:53:11,936 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 7.79 sec
2013-05-22 03:53:12,943 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 7.79 sec
2013-05-22 03:53:13,951 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 7.79 sec
2013-05-22 03:53:14,958 Stage-1 map = 100%,  reduce = 19%, Cumulative CPU 10.25 sec
2013-05-22 03:53:15,965 Stage-1 map = 100%,  reduce = 19%, Cumulative CPU 10.25 sec
2013-05-22 03:53:16,972 Stage-1 map = 100%,  reduce = 19%, Cumulative CPU 10.25 sec
2013-05-22 03:53:17,979 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:18,986 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:19,993 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:20,999 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:22,007 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:23,014 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:24,021 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:25,028 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:26,035 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:27,042 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:28,049 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:29,055 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:30,063 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:31,070 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:32,077 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:33,084 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:34,091 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:35,097 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:36,105 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:37,112 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:38,118 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:39,125 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:40,132 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:41,139 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:42,145 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:43,153 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:44,160 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:45,166 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:46,172 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:47,180 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:48,186 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:49,193 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:50,201 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:51,208 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:52,214 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:53,221 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:54,227 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:55,235 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:56,241 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:57,249 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec
2013-05-22 03:53:58,256 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 10.25 sec

and continues....

Прикрепляю журналы улья

 Counters plan="{"queryId":"root_20130523051919_5e968691-9119-47fa-9b45-f61a90e86452","queryType":null,"queryAttributes":{"queryString":"select count(*) from suborder_issue"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"14","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["GBY_2"],"adjacencyType":"CONJUNCTIVE"},{"node":"GBY_2","children":["RS_3"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"GBY_2","operatorType":"GROUPBY","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"RS_3","operatorType":"REDUCESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"},{"taskId":"Stage-1_REDUCE","taskType":"REDUCE","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"GBY_4","children":["SEL_5"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_5","children":["FS_6"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"GBY_4","operatorType":"GROUPBY","operatorAttributes":"null","operatorCounters":"}","done":"false","started":"true"},{"operatorId":"SEL_5","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"true"},{"operatorId":"FS_6","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1369303703813"
TaskProgress TASK_HADOOP_PROGRESS="2013-05-23 06:08:24,818 Stage-1 map = 100%,  reduce = 14%, Cumulative CPU 23.04 sec" TASK_NUM_REDUCERS="1" TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask" TASK_NUM_MAPPERS="7" TASK_COUNTERS="org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter.CREATED_FILES:0,Job Counters .Launched reduce tasks:1,Job Counters .SLOTS_MILLIS_MAPS:41820,Job Counters .Rack-local map tasks:8,Job Counters .Launched map tasks:8,File Input Format Counters .Bytes Read:0,FileSystemCounters.FILE_BYTES_WRITTEN:327828,FileSystemCounters.CFS_BYTES_READ:3527,org.apache.hadoop.hive.ql.exec.MapOperator$Counter.DESERIALIZE_ERRORS:0,Map-Reduce Framework.Map output materialized bytes:203,Map-Reduce Framework.Map input records:351783,Map-Reduce Framework.Reduce shuffle bytes:78,Map-Reduce Framework.Spilled Records:6,Map-Reduce Framework.Map output bytes:79,Map-Reduce Framework.Total committed heap usage (bytes):2057568256,Map-Reduce Framework.CPU time spent (ms):23040,Map-Reduce Framework.Map input bytes:351783,Map-Reduce Framework.SPLIT_RAW_BYTES:3527,Map-Reduce Framework.Combine input records:0,Map-Reduce Framework.Reduce input records:0,Map-Reduce Framework.Reduce input groups:0,Map-Reduce Framework.Combine output records:0,Map-Reduce Framework.Physical memory (bytes) snapshot:2163638272,Map-Reduce Framework.Reduce output records:0,Map-Reduce Framework.Virtual memory (bytes) snapshot:7556251648,Map-Reduce Framework.Map output records:6" TASK_ID="Stage-1" QUERY_ID="root_20130523051919_5e968691-9119-47fa-9b45-f61a90e86452" TASK_HADOOP_ID="job_201305220529_0009" TIME="1369303704818"
Counters plan="{"queryId":"root_20130523051919_5e968691-9119-47fa-9b45-f61a90e86452","queryType":null,"queryAttributes":{"queryString":"select count(*) from suborder_issue"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"14","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["GBY_2"],"adjacencyType":"CONJUNCTIVE"},{"node":"GBY_2","children":["RS_3"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"GBY_2","operatorType":"GROUPBY","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"RS_3","operatorType":"REDUCESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"},{"taskId":"Stage-1_REDUCE","taskType":"REDUCE","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"GBY_4","children":["SEL_5"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_5","children":["FS_6"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"GBY_4","operatorType":"GROUPBY","operatorAttributes":"null","operatorCounters":"}","done":"false","started":"true"},{"operatorId":"SEL_5","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"true"},{"operatorId":"FS_6","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}" TIME="1369303704818"

person Aashish Katta    schedule 21.05.2013    source источник
comment
Не могли бы вы поделиться примерами реальных запросов, которые застряли в прошлом? Это поможет найти сигнатуру отказа.   -  person Lukas Vermeer    schedule 22.05.2013
comment
Я добавил информацию о запросе и вакансии   -  person Aashish Katta    schedule 22.05.2013


Ответы (2)


Я могу сказать о двух вещах. У вас либо заканчивается память и вам нужно увеличить размер кучи, либо у вас заканчивается дисковое пространство на ваших узлах данных. Проверьте раздел журнала и убедитесь, что он не заполнен.

person Matt E    schedule 21.05.2013
comment
Спасибо. Но у меня достаточно дискового пространства (200 ГБ на каждом узле) и достаточно памяти на каждом узле. - person Aashish Katta; 21.05.2013

Единственный стопроцентный совет - внимательно заглядывать в журналы редуктора (они должны быть в кластере, вероятно, лучше всего начать с веб-интерфейса JobTracker). Количество возможных причин слишком велико.

person Roman Nikitchenko    schedule 22.05.2013