Инициализация FCN уровня деконволюции - потери падают слишком быстро

Я тренирую небольшой (10M весов на 12K изображениях) FCN (см., Например, Long et al, 2015). Архитектура следующая (начинается со слоя FCN8s fc7):

fc7->relu1->dropout->conv2048->conv1024->conv512->deconv1->deconv2->deconv3->deconv4->deconv5->crop->softmax_with_loss

Когда я инициализировал все слои deconv с гауссовыми весами, я получил некоторый (хотя и не всегда) разумный результат. Затем я решил сделать это правильно и использовал скрипты, предоставленные Шелхамером (например, https://github.com/zeakey/DeepSkeleton/blob/master/examples/DeepSkeleton/solve.py)

Слои деконволюции выглядят так (первый):

layer {
  name: "upscore2"
  type: "Deconvolution"
  bottom: "upsample"
  top: "upscore2"
  param {
    lr_mult: 2
  }
   convolution_param {
    # num output: number of channels, our cow+bgr
    num_output: 2
    kernel_size: 8
    stride: 2
    bias_term: false
  }
}

Результат, который я получаю, действительно странный: потери быстро падают (1000 поколений) и остаются на уровне 1, но модель совершенно бесполезна на тестовом наборе. Какие-либо предложения? Я снизил скорость обучения, но вроде ничего не работает.

net: "mcn-train_finetune11_slow_bilinear.prototxt"
solver_mode: GPU
# REDUCE LEARNING RATE
base_lr: 1e-8
lr_policy: "fixed"
iter_size: 1
max_iter: 100000
# REDUCE MOMENTUM TO 0.5
momentum: 0.5
weight_decay: 0.016
test_interval: 1000
test_iter: 125
display: 1000
average_loss: 1000
type: "Nesterov"
snapshot: 1000
snapshot_prefix: "mcn_finetune11_slow_bilinear"
debug_info: false

PS: короткая распечатка тренировки

I0723 08:38:56.772249 29191 solver.cpp:272] Solving MyCoolNetwork, MCN
I0723 08:38:56.772260 29191 solver.cpp:273] Learning Rate Policy: fixed
I0723 08:38:56.775032 29191 solver.cpp:330] Iteration 0, Testing net (#0)
I0723 08:39:02.331010 29191 blocking_queue.cpp:49] Waiting for data
I0723 08:39:18.075814 29191 solver.cpp:397]     Test net output #0: loss = 37.8394 (* 1 = 37.8394 loss)
I0723 08:39:18.799008 29191 solver.cpp:218] Iteration 0 (-2.90699e-35 iter/s, 22.0247s/1000 iters), loss = 42.4986
I0723 08:39:18.799057 29191 solver.cpp:237]     Train net output #0: loss = 42.4986 (* 1 = 42.4986 loss)
I0723 08:39:18.799067 29191 sgd_solver.cpp:105] Iteration 0, lr = 1e-08
I0723 08:46:12.581365 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:46:12.773717 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:51:14.609473 29191 solver.cpp:447] Snapshotting to binary proto file mcn_finetune11_slow_bilinear_iter_1000.caffemodel
I0723 08:51:15.245028 29191 sgd_solver.cpp:273] Snapshotting solver state to binary proto file mcn_finetune11_slow_bilinear_iter_1000.solverstate
I0723 08:51:15.298612 29191 solver.cpp:330] Iteration 1000, Testing net (#0)
I0723 08:51:20.888267 29203 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:51:21.194495 29202 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:51:36.276700 29191 solver.cpp:397]     Test net output #0: loss = 1.18519 (* 1 = 1.18519 loss)
I0723 08:51:36.886041 29191 solver.cpp:218] Iteration 1000 (1.35488 iter/s, 738.075s/1000 iters), loss = 3.89015
I0723 08:51:36.887783 29191 solver.cpp:237]     Train net output #0: loss = 1.82311 (* 1 = 1.82311 loss)
I0723 08:51:36.887807 29191 sgd_solver.cpp:105] Iteration 1000, lr = 1e-08
I0723 08:53:34.997433 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 08:53:35.040670 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:00:35.779531 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:00:35.791441 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:03:31.710410 29191 solver.cpp:447] Snapshotting to binary proto file mcn_finetune11_slow_bilinear_iter_2000.caffemodel
I0723 09:03:32.383363 29191 sgd_solver.cpp:273] Snapshotting solver state to binary proto file mcn_finetune11_slow_bilinear_iter_2000.solverstate
I0723 09:03:32.09 29203 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:03:44.351140 29202 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:03:52.166584 29191 solver.cpp:397]     Test net output #0: loss = 1.14507 (* 1 = 1.14507 loss)
I0723 09:03:52.777982 29191 solver.cpp:218] Iteration 2000 (1.35892 iter/s, 735.881s/1000 iters), loss = 2.60843
I0723 09:03:52.778029 29191 solver.cpp:237]     Train net output #0: loss = 3.07199 (* 1 = 3.07199 loss)
I0723 09:03:52.778038 29191 sgd_solver.cpp:105] Iteration 2000, lr = 1e-08
I0723 09:07:57.400295 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:07:57.448870 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:14:58.070508 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:14:58.100841 29200 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:15:48.708067 29191 solver.cpp:447] Snapshotting to binary proto file mcn_finetune11_slow_bilinear_iter_3000.caffemodel
I0723 09:15:49.358572 29191 sgd_solver.cpp:273] Snapshotting solver state to binary proto file mcn_finetune11_slow_bilinear_iter_3000.solverstate
I0723 09:15:49.411862 29191 solver.cpp:330] Iteration 3000, Testing net (#0)
I0723 09:16:05.268878 29203 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:16:05.502995 29202 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:16:08.177001 29191 solver.cpp:397]     Test net output #0: loss = 1.115 (* 1 = 1.115 loss)
I0723 09:16:08.767503 29191 solver.cpp:218] Iteration 3000 (1.35874 iter/s, 735.979s/1000 iters), loss = 2.57038
I0723 09:16:08.768218 29191 solver.cpp:237]     Train net output #0: loss = 2.33784 (* 1 = 2.33784 loss)
I0723 09:16:08.768534 29191 sgd_solver.cpp:105] Iteration 3000, lr = 1e-08
I0723 09:22:16.315538 29201 data_layer.cpp:73] Restarting data prefetching from start.
I0723 09:22:16.349555 29200 data_layer.cpp:73] Restarting data prefetching from start.

person Alex    schedule 23.07.2017    source источник
comment
попробуйте посмотреть debug_info   -  person Shai    schedule 23.07.2017
comment
вы можете обнаружить, что для некоторых слоев градиент равен нулю, что не позволяет модели улучшаться.   -  person Shai    schedule 23.07.2017
comment
Как это могло быть переоснащением, если валидация тоже упала? (см. править)   -  person Alex    schedule 23.07.2017