tensorflow.python.framework.errors_impl.InvalidArgumentError: недопустимый аргумент: утверждение не выполнено:

Во время обучения с использованием TensorFlow Object Detection API от Google Colab я получил следующую ошибку (в следующем подробном описании есть две похожие ошибки… одна из них в конце):

WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0528 21:13:21.113062 140292083513216 model_lib.py:717] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 200000
I0528 21:13:21.113316 140292083513216 config_util.py:523] Maybe overwriting train_steps: 200000
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0528 21:13:21.113430 140292083513216 config_util.py:523] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0528 21:13:21.113519 140292083513216 config_util.py:523] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0528 21:13:21.113614 140292083513216 config_util.py:523] Maybe overwriting eval_num_epochs: 1
INFO:tensorflow:Maybe overwriting load_pretrained: True
I0528 21:13:21.113696 140292083513216 config_util.py:523] Maybe overwriting load_pretrained: True
INFO:tensorflow:Ignoring config override key: load_pretrained
I0528 21:13:21.113776 140292083513216 config_util.py:533] Ignoring config override key: load_pretrained
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0528 21:13:21.114626 140292083513216 model_lib.py:733] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu False
I0528 21:13:21.114744 140292083513216 model_lib.py:768] create_estimator_and_inputs: use_tpu False, export_to_tpu False
INFO:tensorflow:Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f97ed4dd128>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0528 21:13:21.115245 140292083513216 estimator.py:212] Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f97ed4dd128>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f97d328dbf8>) includes params argument, but params are not passed to Estimator.
W0528 21:13:21.115487 140292083513216 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f97d328dbf8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0528 21:13:21.116259 140292083513216 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0528 21:13:21.116456 140292083513216 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
I0528 21:13:21.116694 140292083513216 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0528 21:13:21.124795 140292083513216 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0528 21:13:21.162153 140292083513216 dataset_builder.py:84] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:101: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W0528 21:13:21.167545 140292083513216 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:101: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0528 21:13:21.167754 140292083513216 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
2020-05-28 21:13:22.910301: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-28 21:13:22.953259: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:13:22.953875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-05-28 21:13:22.960996: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-28 21:13:22.967688: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-28 21:13:22.977811: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-28 21:13:22.985131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-28 21:13:22.995549: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-28 21:13:23.004617: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-28 21:13:23.025234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-28 21:13:23.025382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:13:23.026101: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:13:23.026693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
WARNING:tensorflow:From /content/models/research/object_detection/inputs.py:77: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0528 21:13:33.109247 140292083513216 deprecation.py:323] From /content/models/research/object_detection/inputs.py:77: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0528 21:13:33.221111 140292083513216 deprecation.py:323] From /content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/operators/control_flow.py:1004: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0528 21:13:39.145547 140292083513216 api.py:332] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/operators/control_flow.py:1004: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /content/models/research/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0528 21:13:42.865469 140292083513216 deprecation.py:323] From /content/models/research/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:174: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
W0528 21:13:46.217640 140292083513216 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:174: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
INFO:tensorflow:Calling model_fn.
I0528 21:13:46.233859 140292083513216 estimator.py:1148] Calling model_fn.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0528 21:13:46.430602 140292083513216 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.101978 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.133970 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.165436 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.343221 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.377842 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.414346 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
W0528 21:13:49.456603 140292083513216 variables_helper.py:161] Variable [FeatureExtractor/MobilenetV2/layer_19tensorflow-gpu==1.15.0Conv2dtensorflow-gpu==1.15.03x3_s2_512/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]]. This variable will not be initialized from the checkpoint.
W0528 21:13:49.456816 140292083513216 variables_helper.py:161] Variable [FeatureExtractor/MobilenetV2/layer_19tensorflow-gpu==1.15.0Conv2dssd_mobilenet_v2_coco3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
W0528 21:13:49.456997 140292083513216 variables_helper.py:161] Variable [FeatureExtractor/MobilenetV2/layer_19tensorflow-gpu==1.15.0Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
W0528 21:13:49.457174 140292083513216 variables_helper.py:161] Variable [FeatureExtractor/MobilenetV2/layer_19tensorflow-gpu==1.15.0Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 64, 128]], model variable shape: [[3, 3, 64, 128]]. This variable will not be initialized from the checkpoint.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0528 21:13:54.449208 140292083513216 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
I0528 21:14:00.871218 140292083513216 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0528 21:14:00.872715 140292083513216 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0528 21:14:04.557027 140292083513216 monitored_session.py:240] Graph was finalized.
2020-05-28 21:14:04.557485: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-05-28 21:14:04.562729: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000165000 Hz
2020-05-28 21:14:04.563012: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1771800 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-28 21:14:04.563048: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-28 21:14:04.666903: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.667672: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1770d80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-28 21:14:04.667705: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-05-28 21:14:04.668018: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.668594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-05-28 21:14:04.668682: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-28 21:14:04.668724: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-28 21:14:04.668747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-28 21:14:04.668769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-28 21:14:04.668796: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-28 21:14:04.668819: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-28 21:14:04.668842: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-28 21:14:04.668951: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.669555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.670109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-28 21:14:04.670229: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-28 21:14:04.671546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-28 21:14:04.671575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-05-28 21:14:04.671585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-05-28 21:14:04.671747: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.672416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.672994: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-05-28 21:14:04.673037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
INFO:tensorflow:Running local_init_op.
I0528 21:14:09.605103 140292083513216 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0528 21:14:09.941666 140292083513216 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into training/model.ckpt.
I0528 21:14:18.960145 140292083513216 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/model.ckpt.
2020-05-28 21:14:36.916392: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 1074 of 2048
2020-05-28 21:14:46.905139: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 2026 of 2048
2020-05-28 21:14:46.910085: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:195] Shuffle buffer filled.
2020-05-28 21:14:47.284742: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-28 21:14:53.420068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
INFO:tensorflow:loss = 12.133639, step = 0
I0528 21:14:56.692664 140292083513216 basic_session_run_hooks.py:262] loss = 12.133639, step = 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node __inference_Dataset_map_transform_and_pad_input_data_fn_3047}} assertion failed: [[0.748][0.758]] [[0.67][0.67]]
     [[{{node Assert/AssertGuard/else/_123/Assert}}]]
     [[IteratorGetNext]]
  (1) Invalid argument: {{function_node __inference_Dataset_map_transform_and_pad_input_data_fn_3047}} assertion failed: [[0.748][0.758]] [[0.67][0.67]]
     [[{{node Assert/AssertGuard/else/_123/Assert}}]]
     [[IteratorGetNext]]
     [[IteratorGetNext/_8451]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/models/research/object_detection/model_main.py", line 114, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/models/research/object_detection/model_main.py", line 110, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [[0.748][0.758]] [[0.67][0.67]]
     [[{{node Assert/AssertGuard/else/_123/Assert}}]]
     [[IteratorGetNext]]
  (1) Invalid argument:  assertion failed: [[0.748][0.758]] [[0.67][0.67]]
     [[{{node Assert/AssertGuard/else/_123/Assert}}]]
     [[IteratorGetNext]]
     [[IteratorGetNext/_8451]]
0 successful operations.
0 derived errors ignored.

Хотя я нашел похожие вопросы с похожим названием, но ошибки разные. Здесь дополнительно упоминается, что я использую tensorflow-gpu==1.15.0, а для точной настройки используется модель ssd_mobilenet_v2_coco.

Есть какие-нибудь подсказки, почему возникает эта ошибка?


person hafiz031    schedule 28.05.2020    source источник
comment
хорошо, позвольте мне поделиться с вами своими выводами. Я использую другую модель (ssd_inception_v2_coco), но ошибка у меня точно такая же. Я следую этому руководству (tensorflow-object-detection-api-tutorial.readthedocs.io/en/). Я удалил tenorflow-gpu 1.15 и установил 1.14, и он начал обучение. Иногда после шагов 200, иногда после шагов 1900 я все еще получаю ту же ошибку. Я проверил свои файлы tfrecord и удалил все файлы с изображениями шириной или высотой более 300 пикселей, но все равно получаю ту же ошибку.   -  person Ahmet Cetin    schedule 30.05.2020


Ответы (3)


У меня была точно такая же ошибка при создании моих собственных tfrecords для переобучения моей модели. Проблема заключалась в том, что высота одного из помеченных ящиков была отрицательной. Я бы рекомендовал проверить правильность ваших данных.

person Colegram    schedule 07.06.2020
comment
Не только негативы, но также я удалил все подозрительные примеры, которые могли создать проблемы, и, наконец, это сработало! Помимо негативов, я также проверил, не превышают ли заданные значения координат ящиков ширину или высоту самого изображения. Если это так, я удалил этот пример при создании файлов csv и, следовательно, построил tfrecords. Это означает, что нельзя включать поврежденные записи при создании tfrecords. - person hafiz031; 14.06.2020
comment
Можете ли вы помочь мне в этом stackoverflow.com/ вопросы / 68225332 /? - person user; 04.07.2021

Хорошо! официальный ответ может помочь. Я исправил эту проблему в два этапа. Во-первых, создавая CSV, вы должны убедиться, что нет недействительных записей. Я имею в виду, что нет недопустимого изображения и / или нет изображения с ограничивающими рамками за пределами изображения, то есть сначала проверьте, находятся ли xmin, ymin, xmax, ymax в разрешении изображения, и они не являются отрицательными. Также проверьте, являются ли width и height положительными.

Во-вторых, делая tf_example, я выполнил несколько дополнительных проверок, чтобы убедиться, что все координаты все еще находятся в пределах изображения. tfrecord хочет, чтобы координаты были масштабированы в [0, 1]. Хотя, по логике вещей, если мы делаем первый шаг, то проверять его заново не нужно. Но то, что я обнаружил, вероятно, связано с проблемами точности с плавающей запятой в этих масштабированных координаты иногда становились больше 1.0 или меньше 0.0 и снова создавали эту ошибку. Поэтому я дополнительно проверяю следующую функцию, чтобы убедиться, что каждая из записей действительна, прежде чем записывать их в tfrecord. В случае, если они > 1.0, я сделал их 1.0, а если < 0.0, я сделал их 0.0. Ниже приведен код:

def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        
        ########### ADDITIONAL CHECKS START HERE ###################

        xmn = row['xmin'] / width
        if xmn < 0.0:
            xmn = 0.0
        elif xmn > 1.0:
            xmn = 1.0
        xmins.append(xmn)

        xmx = row['xmax'] / width
        if xmx < 0.0:
            xmx = 0.0
        elif xmx > 1.0:
            xmx = 1.0
        xmaxs.append(xmx)

        ymn = row['ymin'] / height
        if ymn < 0.0:
            ymn = 0.0
        elif ymn > 1.0:
            ymn = 1.0
        ymins.append(ymn)

        ymx = row['ymax'] / height
        if ymx < 0.0:
            ymx = 0.0
        elif ymx > 1.0:
            ymx = 1.0
        ymaxs.append(ymx)

        ############ ADDITIONAL CHECKS END HERE ####################

        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example

Также есть еще один угловой случай, который может быть причиной возникновения этой ситуации. Это связано с тем, как вы аннотировали ограничивающие рамки. Сначала я описываю правильный способ аннотирования. Остальное вы разберетесь сами. Если при рисовании этих блоков аннотаций вы перетаскиваете мышь от left-top к right-bottom, инструмент аннотатора рассматривает точку left-top как первую точку, то есть (xmin, ymin), и точку right-bottom, как вторую точку, то есть (xmax, ymax). Это совершенно нормально, потому что в этом случае автоматически выполняются условия xmin < xmax и ymin < ymax. Но что происходит, когда вы делаете что-то другое? Например, если вы перетащите мышь от right-bottom точки к left-top точке, инструмент аннотатора может принять точку right-bottom как (xmin, ymin), а точку left-top как (xmax, ymax). Что совершенно неверно. Поскольку в этом случае xmax становится меньше xmin, и такая же проблема возникает для ymax и ymin. Поэтому убедитесь, что ваше программное обеспечение-аннотатор способно справиться с подобными ситуациями, наблюдая за тем, как вы перетаскиваете мышь.

Итак, если вы обнаружите, что аннотации ограничивающих прямоугольников действительно имеют эту проблему, вы можете легко исправить CSV, обновив значения xmin, xmax, ymin и ymax следующим образом:

import numpy as np

xmin_new = np.min(xmin, xmax)
xmax_new = np.max(xmin, xmax)
ymin_new = np.min(ymin, ymax)
ymax_new = np.max(ymin, ymax)

Также обратите внимание, что я использовал разные переменные для получения новых значений, замена значений старых переменных (xmin, xmax, ymin, ymax) сделает дальнейшие вычисления неверными, так как при использовании np.max() или np.min() выражение ожидает предыдущие значения из них, а не значения, которые мы только что обновили в ходе этого процесса.

person hafiz031    schedule 17.10.2020
comment
Я сделал ту же ошибку. Придется исправить сейчас - person user3151256; 09.05.2021
comment
Спасибо тебе за это! Специально последняя часть на min max перепуталась. - person blue-zircon; 20.05.2021

Моя проблема заключалась в ограничивающих прямоугольниках с областью = 0 или площадью <0

width = xmax - xmin
height = ymax - ymin

area = width * height
if area == 0 or area < 0
  print(f"Bbx with area = {} not allowed")
person Ceachi Bogdan    schedule 17.07.2021