Google Speech to Text API расшифровывает аудиоданные с помощью Python - ошибка декодирования

Я пытаюсь преобразовать аудиоданные с помощью API Google Speech to Text, следуя этому руководству https://cloud.google.com/speech-to-text/docs/async-recognize.

У меня есть форма, которая позволяет пользователям записывать аудио. Я запрашиваю аудиоданные из формы с помощью этого кода:

f = request.files['audio_data'].read()

Я выяснил, как сохранить аудиоданные (f) в ведро GCS и получить транскрипцию, передав местоположение файла функции в учебнике (также скопировано ниже). Однако я хотел бы избежать этапа создания / загрузки файла .wav и иметь API, напрямую использующий записанные аудиоданные.

У меня есть аудиоданные в следующем формате:

b'RIFF $ \ x00 \ x04 \ x00WAVEfmt \ x10 \ x00 \ x00 \ x00 \ x01 \ x00 \ x01 \ x00 \ x80 \ xbb \ x00 \ x00 \ x00 \ xee \ x02 \ (...) x00 \ x00 \ x00 \ x00 \ x00 '

Я получаю следующее сообщение об ошибке:

UnicodeDecodeError: кодек utf-8 не может декодировать байт 0x80 в позиции 24: недопустимый начальный байт

Вот функция Python:

def transcribe_file(speech_file):
    """Transcribe the given audio file asynchronously."""
    from google.cloud import speech

    client = speech.SpeechClient()

    with io.open(speech_file, "rb") as audio_file:
        content = audio_file.read()

    """
     Note that transcription is limited to a 60 seconds audio file.
     Use a GCS file for audio longer than 1 minute.
    """
    audio = speech.RecognitionAudio(content=content)

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )


    operation = client.long_running_recognize(config=config, audio=audio)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=90)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(u"Transcript: {}".format(result.alternatives[0].transcript))
        print("Confidence: {}".format(result.alternatives[0].confidence))

Я пробовал несколько предложений по кодированию / декодированию, например. здесь: UnicodeDecodeError: кодек utf8 может ' t декодировать байт 0x80 в позиции 3131: недопустимый начальный байт, но ни одно из этих решений, похоже, не работает для меня.

Что я делаю неправильно?

Markus.K 20.01.2021 источник

Google Speech to Text API расшифровывает аудиоданные с помощью Python - ошибка декодирования

Вопросы по теме