Устранение проблем с синхронизацией аудио/видео с RTP/RTCP, отправленными из mediasoup

Немного предыстории: я пытаюсь записать вызов webrtc через SFU mediasoup v2. Я использую метод mediasoup room.createRtpStreamer() для создания потока, который отражает RTP/RTCP в ffmpeg. Два стримера создаются для аудио и видео с интервалом ~30 мс друг от друга и начинают трансляцию. Затем FFmpeg раскручивается и начинает принимать. Почти уверен, что RTCP работает, поскольку ffmpeg всегда запускается с ключевого кадра, несмотря на то, что он запускается после того, как стример начинает вещание.

Проблема в том, что я сталкиваюсь с десинхронизацией аудио/видео с кажущимися случайными смещениями. Моя текущая теория заключается в том, что это смещение основано на том, сколько лет последнему ключевому кадру, который RTCP запрашивает для запуска потока. См. ниже информацию о конфигурации и выводе ffmpeg, но мой вопрос: какие аргументы ffmpeg я могу использовать для настройки временных меток видеокадра в соответствии с временными метками аудиокадра или наоборот? Я возился с -map 0:0,0:1 -map 0:1,0:1, но, похоже, это не то, что я ищу.

флаги ffmpeg:

'-y',
'-loglevel',
'debug',
'-dump',
'-protocol_whitelist',
'file,crypto,udp,rtp,data',
'-analyzeduration',
'20M',
'-probesize',
'20M',
'-i',
`data:text/plain;base64,${sdp.toString('base64')}`,
'-fflags',
'+genpts',
'-vcodec',
'copy',
'-acodec',
'aac',
'-bsf:v',
'h264_mp4toannexb',
'-start_number',
'0',
'-hls_list_size',
'2147480000',
'-hls_wrap',
'0',
'-hls_time',
'10',

SDP используется для ввода (шаблон):

v=0
o=- 0 0 IN IP4 <%=ip %>
s=title
c=IN IP4 <%=ip %>
m=audio <%=audioPort %> RTP/AVPF <%=audioPayload %>
a=sendrecv
a=rtcp-mux
a=rtpmap:<%=audioPayload %> opus/48000/2
a=fmtp:<%=audioPayload %> minptime=10; useinbandfec=1
m=video <%=videoPort %> RTP/AVPF <%=videoPayload %>
a=sendrecv
a=rtcp-mux
a=rtpmap:<%=videoPayload %> H264/90000
a=rtcp-fb:<%=videoPayload %> ccm fir
a=rtcp-fb:<%=videoPayload %> nack
a=rtcp-fb:<%=videoPayload %> nack pli
a=rtcp-fb:<%=videoPayload %> goog-remb
a=rtcp-fb:<%=videoPayload %> transport-cc
a=fmtp:<%=videoPayload %> level-asymmetry-allowed=1;packetization-mode=1

Вывод ffmpeg - искажен с некоторыми временными метками

1512775954585 - stderr: ffmpeg version 3.4 Copyright (c) 2000-2017 the FFmpeg developers
  built with Apple LLVM version 9.0.0 (clang-900.0.38)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/3.4 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-ffplay --enable-libmp3lame --enable-libvpx --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma
1512775954587 - stderr:   libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
1512775954587 - stderr:   libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Splitting the commandline.
Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'debug'.
Reading option '-dump' ... matched as option 'dump' (dump each input packet) with argument '1'.
Reading option '-protocol_whitelist' ...1512775954589 - stderr:  matched as AVOption 'protocol_whitelist' with argument 'file,crypto,udp,rtp,data'.
Reading option '-analyzeduration' ...1512775954590 - stderr:  matched as AVOption 'analyzeduration' with argument '20M'.
Reading option '-probesize' ...1512775954590 - stderr:  matched as AVOption 'probesize' with argument '20M'.
Reading option '-i' ... matched as input url with argument 'data:text/plain;base64,dj0wCm89LSAwIDAgSU4gSVA0IDEyNy4wLjAuMQpzPWUwYzkyZmEwLWRhZDUtMTFlNy04Njg3LTA5MGRkYTk1YjFhNCBmb29ib2FyCmM9SU4gSVA0IDEyNy4wLjAuMQptPWF1ZGlvIDIwMDAwIFJUUC9BVlBGIDEwMAphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAwIG9wdXMvNDgwMDAvMgphPWZtdHA6MTAwIG1pbnB0aW1lPTEwOyB1c2VpbmJhbmRmZWM9MQptPXZpZGVvIDIwMDAyIFJUUC9BVlBGIDEwMQphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAxIEgyNjQvOTAwMDAKYT1ydGNwLWZiOjEwMSBjY20gZmlyCmE9cnRjcC1mYjoxMDEgbmFjawphPXJ0Y3AtZmI6MTAxIG5hY2sgcGxpCmE9cnRjcC1mYjoxMDEgZ29vZy1yZW1iCmE9cnRjcC1mYjoxMDEgdHJhbnNwb3J0LWNjCmE9Zm10cDoxMDEgbGV2ZWwtYXN5bW1ldHJ5LWFsbG93ZWQ9MTtwYWNrZXRpemF0aW9uLW1vZGU9MTtwcm9maWxlLWxldmVsLWlkPTQyZTAxZgo='.
Reading option '-fflags' ...1512775954591 - stderr:  matched as AVOption 'fflags' with argument '+genpts'.
Reading option '-vcodec' ... matched as option 'vcodec' (force video codec ('copy' to copy stream)) with argument 'copy'.
Reading option '-acodec' ... matched as option 'acodec' (force audio codec ('copy' to copy stream)) with argument 'aac'.
Reading option '-vsync' ... matched as option 'vsync' (video sync method) with argument '0'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:0,0:1'.
Reading option '-map' ... matched as option 'map' (set input stream mapping) with argument '0:1,0:1'.
Reading option '-bsf:v' ... matched as option 'bsf' (A comma-separated list of bitstream filters) with argument 'h264_mp4toannexb'.
Reading option '-start_number' ...1512775954591 - stderr:  matched as AVOption 'start_number' with argument '0'.
Reading option '-hls_list_size' ...1512775954591 - stderr:  matched as AVOption 'hls_list_size' with argument '2147480000'.
Reading option '-hls_wrap' ... matched as AVOption 'hls_wrap' with argument '0'.
Reading option '-hls_time' ...1512775954591 - stderr:  matched as AVOption 'hls_time' with argument '10'.
Reading option '/tmp/archive/e0c92fa0-dad5-11e7-8687-090dda95b1a4_10e1c990-dc70-11e7-888d-9f39ca0c79bc/1512775954465.m3u8' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option y (overwrite output files) with argument 1.
1512775954592 - stderr: Applying option loglevel (set logging level) with argument debug.
Applying option dump (dump each input packet) with argument 1.
Applying option vsync (video sync method) with argument 0.
Successfully parsed a group of options.
Parsing a group of options: input url data:text/plain;base64,dj0wCm89LSAwIDAgSU4gSVA0IDEyNy4wLjAuMQpzPWUwYzkyZmEwLWRhZDUtMTFlNy04Njg3LTA5MGRkYTk1YjFhNCBmb29ib2FyCmM9SU4gSVA0IDEyNy4wLjAuMQptPWF1ZGlvIDIwMDAwIFJUUC9BVlBGIDEwMAphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAwIG9wdXMvNDgwMDAvMgphPWZtdHA6MTAwIG1pbnB0aW1lPTEwOyB1c2VpbmJhbmRmZWM9MQptPXZpZGVvIDIwMDAyIFJUUC9BVlBGIDEwMQphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAxIEgyNjQvOTAwMDAKYT1ydGNwLWZiOjEwMSBjY20gZmlyCmE9cnRjcC1mYjoxMDEgbmFjawphPXJ0Y3AtZmI6MTAxIG5hY2sgcGxpCmE9cnRjcC1mYjoxMDEgZ29vZy1yZW1iCmE9cnRjcC1mYjoxMDEgdHJhbnNwb3J0LWNjCmE9Zm10cDoxMDEgbGV2ZWwtYXN5bW1ldHJ5LWFsbG93ZWQ9MTtwYWNrZXRpemF0aW9uLW1vZGU9MTtwcm9maWxlLWxldmVsLWlkPTQyZTAxZgo=.
Successfully parsed a group of options.
Opening an input file: data:text/plain;base64,dj0wCm89LSAwIDAgSU4gSVA0IDEyNy4wLjAuMQpzPWUwYzkyZmEwLWRhZDUtMTFlNy04Njg3LTA5MGRkYTk1YjFhNCBmb29ib2FyCmM9SU4gSVA0IDEyNy4wLjAuMQptPWF1ZGlvIDIwMDAwIFJUUC9BVlBGIDEwMAphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAwIG9wdXMvNDgwMDAvMgphPWZtdHA6MTAwIG1pbnB0aW1lPTEwOyB1c2VpbmJhbmRmZWM9MQptPXZpZGVvIDIwMDAyIFJUUC9BVlBGIDEwMQphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAxIEgyNjQvOTAwMDAKYT1ydGNwLWZiOjEwMSBjY20gZmlyCmE9cnRjcC1mYjoxMDEgbmFjawphPXJ0Y3AtZmI6MTAxIG5hY2sgcGxpCmE9cnRjcC1mYjoxMDEgZ29vZy1yZW1iCmE9cnRjcC1mYjoxMDEgdHJhbnNwb3J0LWNjCmE9Zm10cDoxMDEgbGV2ZWwtYXN5bW1ldHJ5LWFsbG93ZWQ9MTtwYWNrZXRpemF0aW9uLW1vZGU9MTtwcm9maWxlLWxldmVsLWlkPTQyZTAxZgo=.
1512775954592 - stderr: [NULL @ 0x7f81fd000000] Opening 'data:text/plain;base64,dj0wCm89LSAwIDAgSU4gSVA0IDEyNy4wLjAuMQpzPWUwYzkyZmEwLWRhZDUtMTFlNy04Njg3LTA5MGRkYTk1YjFhNCBmb29ib2FyCmM9SU4gSVA0IDEyNy4wLjAuMQptPWF1ZGlvIDIwMDAwIFJUUC9BVlBGIDEwMAphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAwIG9wdXMvNDgwMDAvMgphPWZtdHA6MTAwIG1pbnB0aW1lPTEwOyB1c2VpbmJhbmRmZWM9MQptPXZpZGVvIDIwMDAyIFJUUC9BVlBGIDEwMQphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAxIEgyNjQvOTAwMDAKYT1ydGNwLWZiOjEwMSBjY20gZmlyCmE9cnRjcC1mYjoxMDEgbmFjawphPXJ0Y3AtZmI6MTAxIG5hY2sgcGxpCmE9cnRjcC1mYjoxMDEgZ29vZy1yZW1iCmE9cnRjcC1mYjoxMDEgdHJhbnNwb3J0LWNjCmE9Zm10cDoxMDEgbGV2ZWwtYXN5bW1ldHJ5LWFsbG93ZWQ9MTtwYWNrZXRpemF0aW9uLW1vZGU9MTtwcm9maWxlLWxldmVsLWlkPTQyZTAxZgo=' for reading
1512775954593 - stderr: [data @ 0x7f81fca001a0] Content-type: text/plain
{"level":"info","time":"Dec 8, 2017 11:32 PM","message":"ffmpeg started"}
1512775954595 - stderr: [sdp @ 0x7f81fd000000] Format sdp probed with size=2048 and score=50
1512775954598 - stderr: [sdp @ 0x7f81fd000000] audio codec set to: opus
[sdp @ 0x7f81fd000000] audio samplerate set to: 48000
[sdp @ 0x7f81fd000000] audio channels set to: 2
1512775954639 - stderr: [sdp @ 0x7f81fd000000] video codec set to: h264
[sdp @ 0x7f81fd000000] RTP Packetization Mode: 1
[sdp @ 0x7f81fd000000] RTP Profile IDC: 42 Profile IOP: e0 Level: 1f
[udp @ 0x7f81fcb007e0] end receive buffer size reported is 65536
[udp @ 0x7f81fbe00180] end receive buffer size reported is 65536
[sdp @ 0x7f81fd000000] setting jitter buffer size to 500
[udp @ 0x7f81fbe00680] end receive buffer size reported is 65536
[udp @ 0x7f81fbe00740] end receive buffer size reported is 65536
[sdp @ 0x7f81fd000000] setting jitter buffer size to 500
[sdp @ 0x7f81fd000000] Before avformat_find_stream_info() pos: 479 bytes read:479 seeks:0 nb_streams:2
{"level":"info","time":"Dec 8, 2017 11:32 PM","message":"new active speaker","activePeer":"9f05d96a-9641-4c63-8f0e-486b98e48eb5"}
1512775954773 - stderr: [AVBSFContext @ 0x7f81fc8018e0] nal_unit_type: 7, nal_ref_idc: 3
[AVBSFContext @ 0x7f81fc8018e0] nal_unit_type: 8, nal_ref_idc: 3
[AVBSFContext @ 0x7f81fc8018e0] nal_unit_type: 5, nal_ref_idc: 3
1512775954774 - stderr: [h264 @ 0x7f8200000c00] nal_unit_type: 7, nal_ref_idc: 3
1512775954774 - stderr: [h264 @ 0x7f8200000c00] nal_unit_type: 8, nal_ref_idc: 3
1512775954774 - stderr: [h264 @ 0x7f8200000c00] nal_unit_type: 5, nal_ref_idc: 3
1512775954774 - stderr: [h264 @ 0x7f8200000c00] Reinit context to 640x480, pix_fmt: yuv420p
1512775954801 - stderr: [h264 @ 0x7f8200000c00] nal_unit_type: 1, nal_ref_idc: 3
1512775954939 - stderr: [h264 @ 0x7f8200000c00] nal_unit_type: 7, nal_ref_idc: 3
1512775954939 - stderr: [h264 @ 0x7f8200000c00] nal_unit_type: 8, nal_ref_idc: 3
1512775954940 - stderr: [h264 @ 0x7f8200000c00] nal_unit_type: 5, nal_ref_idc: 3
1512775955003 - stderr: [h264 @ 0x7f8200000c00] nal_unit_type: 1, nal_ref_idc: 3
1512775955999 - stderr:     Last message repeated 3 times
[sdp @ 0x7f81fd000000] All info found
1512775955999 - stderr: [sdp @ 0x7f81fd000000] rfps: 29.750000 0.019566
[sdp @ 0x7f81fd000000] rfps: 29.833333 0.015263
[sdp @ 0x7f81fd000000] rfps: 29.916667 0.011503
1512775955999 - stderr: [sdp @ 0x7f81fd000000] rfps: 30.000000 0.008285
[sdp @ 0x7f81fd000000] rfps: 31.000000 0.011990
    Last message repeated 1 times
[sdp @ 0x7f81fd000000] rfps: 29.970030 0.009380
[sdp @ 0x7f81fd000000] After avformat_find_stream_info() pos: 479 bytes read:479 seeks:0 frames:98
1512775956000 - stderr: Input #0, sdp, from 'data:text/plain;base64,dj0wCm89LSAwIDAgSU4gSVA0IDEyNy4wLjAuMQpzPWUwYzkyZmEwLWRhZDUtMTFlNy04Njg3LTA5MGRkYTk1YjFhNCBmb29ib2FyCmM9SU4gSVA0IDEyNy4wLjAuMQptPWF1ZGlvIDIwMDAwIFJUUC9BVlBGIDEwMAphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAwIG9wdXMvNDgwMDAvMgphPWZtdHA6MTAwIG1pbnB0aW1lPTEwOyB1c2VpbmJhbmRmZWM9MQptPXZpZGVvIDIwMDAyIFJUUC9BVlBGIDEwMQphPXNlbmRyZWN2CmE9cnRjcC1tdXgKYT1ydHBtYXA6MTAxIEgyNjQvOTAwMDAKYT1ydGNwLWZiOjEwMSBjY20gZmlyCmE9cnRjcC1mYjoxMDEgbmFjawphPXJ0Y3AtZmI6MTAxIG5hY2sgcGxpCmE9cnRjcC1mYjoxMDEgZ29vZy1yZW1iCmE9cnRjcC1mYjoxMDEgdHJhbnNwb3J0LWNjCmE9Zm10cDoxMDEgbGV2ZWwtYXN5bW1ldHJ5LWFsbG93ZWQ9MTtwYWNrZXRpemF0aW9uLW1vZGU9MTtwcm9maWxlLWxldmVsLWlkPTQyZTAxZgo=':
1512775956000 - stderr:   Metadata:
    title           : e0c92fa0-dad5-11e7-8687-090dda95b1a4 fooboar
  Duration: N/A, start: 0.000000, bitrate: N/A
    Stream #0:0, 70, 1/48000: Audio: opus, 48000 Hz, stereo, fltp1512775956000 - stderr:
    Stream #0:1, 28, 1/90000: Video: h264 (Constrained Baseline), 1 reference frame, yuv420p(progressive, left), 640x480, 0/1, 30 tbr, 90k tbn, 180k tbc
1512775956000 - stderr: Successfully opened the file.
Parsing a group of options: output url /tmp/archive/e0c92fa0-dad5-11e7-8687-090dda95b1a4_10e1c990-dc70-11e7-888d-9f39ca0c79bc/1512775954465.m3u8.
Applying option vcodec (force video codec ('copy' to copy stream)) with argument copy.
Applying option acodec (force audio codec ('copy' to copy stream)) with argument aac.
Applying option map (set input stream mapping) with argument 0:0,0:1.
Applying option map (set input stream mapping) with argument 0:1,0:1.
Applying option bsf:v (A comma-separated list of bitstream filters) with argument h264_mp4toannexb.
Successfully parsed a group of options.
1512775956000 - stderr: Opening an output file: /tmp/archive/e0c92fa0-dad5-11e7-8687-090dda95b1a4_10e1c990-dc70-11e7-888d-9f39ca0c79bc/1512775954465.m3u8.
1512775956000 - stderr: Successfully opened the file.
1512775956001 - stderr: [AVBSFContext @ 0x7f81fcb020a0] The input looks like it is Annex B already
Stream mapping:
1512775956001 - stderr:   Stream #0:0 -> #0:0 [sync #0:1] (opus (native) -> aac (native))
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
stream #0:
  keyframe=1
  duration=0.000
  dts=0.000  pts=0.000
  size=82
1512775956001 - stderr: [SWR @ 0x7f820001e600] Using fltp internally between filters
1512775956002 - stderr: detected 8 logical cores
1512775956003 - stderr: [graph_0_in_0_0 @ 0x7f81fc90dfc0] Setting 'time_base' to value '1/48000'
[graph_0_in_0_0 @ 0x7f81fc90dfc0] Setting 'sample_rate' to value '48000'
1512775956003 - stderr: [graph_0_in_0_0 @ 0x7f81fc90dfc0] Setting 'sample_fmt' to value 'fltp'
[graph_0_in_0_0 @ 0x7f81fc90dfc0] Setting 'channel_layout' to value '0x3'
[graph_0_in_0_0 @ 0x7f81fc90dfc0] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3
[format_out_0_0 @ 0x7f81fc914da0] Setting 'sample_fmts' to value 'fltp'
[format_out_0_0 @ 0x7f81fc914da0] Setting 'sample_rates' to value '96000|88200|64000|48000|44100|32000|24000|22050|16000|12000|11025|8000|7350'
1512775956004 - stderr: [AVFilterGraph @ 0x7f81fbd02200] query_formats: 4 queried, 9 merged, 0 already done, 0 delayed
1512775956007 - stderr: [hls @ 0x7f81fd80f000] Opening '/tmp/archive/e0c92fa0-dad5-11e7-8687-090dda95b1a4_10e1c990-dc70-11e7-888d-9f39ca0c79bc/15127759544650.ts' for writing
[file @ 0x7f81fbf01ee0] Setting default whitelist 'file,crypto'
1512775956007 - stderr: [mpegts @ 0x7f81fd877800] muxrate VBR, pcr every 9000 pkts, sdt every 2147483647, pat/pmt every 2147483647 pkts
Output #0, hls, to '/tmp/archive/e0c92fa0-dad5-11e7-8687-090dda95b1a4_10e1c990-dc70-11e7-888d-9f39ca0c79bc/1512775954465.m3u8':
  Metadata:
    title           : e0c92fa0-dad5-11e7-8687-090dda95b1a4 fooboar
    encoder         : Lavf57.83.100
1512775956007 - stderr:     Stream #0:0, 0, 1/90000: Audio: aac (LC), 48000 Hz, stereo, fltp, delay 1024, 128 kb/s
    Metadata:
      encoder         : Lavc57.107.100 aac
    Stream #0:1, 0, 1/90000: Video: h264 (Constrained Baseline), 1 reference frame, yuv420p(progressive, left), 640x480 (0x0), 0/1, q=2-31, 30 tbr, 90k tbn, 90k tbc
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
1512775956007 - stderr:     Last message repeated 1 times
stream #0:
  keyframe=1
  duration=0.000
  dts=0.020  pts=0.020
  size=79
1512775956007 - stderr: cur_dts is invalid (this is harmless if it occurs once at the start per stream)
    Last message repeated 1 times
stream #0:
  keyframe=1
  duration=0.000
1512775956007 - stderr:   dts=0.040  pts=0.040
  size=75
1512775956014 - stderr: cur_dts is invalid (this is harmless if it occurs once at the start per stream)
stream #0:
  keyframe=1
  duration=0.000
  dts=0.0601512775956014 - stderr:   pts=0.060
  size=81
1512775956017 - stderr: cur_dts is invalid (this is harmless if it occurs once at the start per stream)
stream #0:
  keyframe=1
  duration=0.000
  dts=0.080  pts=0.080
  size=76
1512775956022 - stderr: cur_dts is invalid (this is harmless if it occurs once at the start per stream)
stream #0:
  keyframe=1
  duration=0.000
  dts=0.1001512775956022 - stderr:   pts=0.100
  size=79
1512775956023 - stderr: cur_dts is invalid (this is harmless if it occurs once at the start per stream)
stream #0:
  keyframe=1
1512775956023 - stderr:   duration=0.000
  dts=0.120  pts=0.120
  size=95
1512775956024 - stderr: cur_dts is invalid (this is harmless if it occurs once at the start per stream)
1512775956024 - stderr: stream #0:
  keyframe=1
  duration=0.000
  dts=0.140  pts=0.140
  size=93
1512775956025 - stderr: cur_dts is invalid (this is harmless if it occurs once at the start per stream)
stream #0:
  keyframe=1
1512775956025 - stderr:   duration=0.000
  dts=0.160  pts=0.160
  size=94
1512775956026 - stderr: cur_dts is invalid (this is harmless if it occurs once at the start per stream)
stream #1:
  keyframe=1
1512775956026 - stderr:   duration=0.000
  dts=N/A  pts=N/A
  size=992
[hls @ 0x7f81fd80f000] Timestamps are unset in a packet for stream 1. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
1512775956026 - stderr: stream #0:
  keyframe=1
  duration=0.000
1512775956026 - stderr:   dts=0.180  pts=0.180
  size=108
1512775956027 - stderr: stream #1:
  keyframe=0
  duration=0.000
1512775956027 - stderr:   dts=0.002  pts=0.002
  size=3047
[hls @ 0x7f81fd80f000] pkt->duration = 0, maybe the hls segment duration will not precise
stream #0:
  keyframe=1
  duration=0.000
  dts=0.200  pts=0.200
  size=89
1512775956060 - stderr: stream #0:
  keyframe=1
  duration=0.000
  dts=0.220  pts=0.220
  size=73
stream #0:
  keyframe=1
  duration=0.000
  dts=0.240  pts=0.240
  size=78
stream #0:
  keyframe=1
  duration=0.000
  dts=0.260  pts=0.260

Обратите внимание, как первый кадр для потока № 1 (видео) начинается после ряда аудиокадров? В частности, он начинается с потока #0 dts/pts 0.18. В этой ситуации проблема синхронизации аудио/видео почти незаметна, но с кучей репродукций я определил, что смещение синхронизации аудио/видео всегда равно продолжительности аудиокадров, которые были отправлены до первого видеокадра ( иногда секунды). Я постоянно запускаю потоки RTP с разницей всего в десятки мс, поэтому я не могу контролировать эту дисперсию на стороне ввода.

После того, как приходят начальные аудиокадры, первый видеокадр имеет dts/pts около 0. Какую настройку ffmpeg я бы использовал для соответствующей настройки временных меток? Меня не волнует потеря начального звука без видео, поэтому работает любое решение, которое настраивает временные метки.


person artushin    schedule 09.12.2017    source источник
comment
Добавление -use_wallclock_as_timestamps к входным данным устраняет проблему при локальном тестировании, но я не буду называть это ответом, пока не проверю с некоторой деградацией сети.   -  person artushin    schedule 12.12.2017


Ответы (1)


Я мало знаю о ffmpeg и не уверен, что он способен синхронизировать отдельные входные аудио- и видеопотоки RTP. Когда браузер получает эти аудио/видеопотоки RTP, проблем с синхронизацией вообще не возникает. Фактически, mediasoup правильно устанавливает временную метку RTP и поля RTCP ntp.

Я предлагаю вам задать вопрос в списке рассылки ffmpeg.

person Iñaki Baz Castillo    schedule 10.12.2017