Can google docs transcribe an audio file

This page demonstrates how to transcribe long audio files (longer than 1 minute) to text using asynchronous speech recognition.

Asynchronous speech recognition starts a long running audio processing operation. Use asynchronous speech recognition to transcribe audio that is longer than 60 seconds. For shorter audio, synchronous speech recognition is faster and simpler. The upper limit for asynchronous speech recognition is 480 minutes.

Audio content can be sent directly to Speech-to-Text from a local file for asynchronous processing. However, the audio time limit for local files is 60 seconds. Attempting to transcribe local audio files that are longer than 60 seconds will result in an error. To use asynchronous speech recognition to transcribe audio longer than 60 seconds, you must have your data saved in a Google Cloud Storage bucket.

You can retrieve the results of the operation using the google.longrunning.Operations method. Results remain available for retrieval for 5 days (120 hours). You also have the option of uploading your results directly to a Google Cloud Storage bucket.

These samples use a Cloud Storage bucket to store the raw audio input for the long-running transcription process. For an example of a typical longrunningrecognize operation response, see the reference documentation.

Protocol

Refer to the speech:longrunningrecognize API endpoint for complete details.

To perform synchronous speech recognition, make a POST request and provide the appropriate request body. The following shows an example of a POST request using curl. The example uses the access token for a service account set up for the project using the Google Cloud Google Cloud CLI. For instructions on installing the gcloud CLI, setting up a project with a service account, and obtaining an access token, see the quickstart.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'config': {
    'language_code': 'en-US'
  },
  'audio':{
    'uri':'gs://cloud-samples-tests/speech/brooklyn.flac'
  }
}" "https://speech.googleapis.com/v1/speech:longrunningrecognize"

See the RecognitionConfig and RecognitionAudio reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "name": "7612202767953098924"
}

where name is the name of the long running operation created for the request.

Wait for processing to complete. Processing time differs depending on your source audio. In most cases, you will get results in half the length of the source audio. You can get the status of your long-running operation by making a GET request to the https://speech.googleapis.com/v1/operations/ endpoint. Replace your-operation-name with the name returned from your longrunningrecognize request.

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     "https://speech.googleapis.com/v1/operations/your-operation-name"

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "name": "7612202767953098924",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2017-07-20T16:36:55.033650Z",
    "lastUpdateTime": "2017-07-20T16:37:17.158630Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "transcript": "how old is the Brooklyn Bridge",
            "confidence": 0.96096134,
          }
        ]
      },
      {
        "alternatives": [
          {
            ...
          }
        ]
      }
    ]
  }
}

If the operation has not completed, you can poll the endpoint by repeatedly making the GET request until the done property of the response is true.

gcloud CLI

Refer to the recognize-long-running command for complete details.

To perform asynchronous speech recognition, use the Google Cloud CLI, providing the path of a local file or a Google Cloud Storage URL.

gcloud ml speech recognize-long-running \
    'gs://cloud-samples-tests/speech/brooklyn.flac' \
     --language-code='en-US' --async

If the request is successful, the server returns the ID of the long-running operation in JSON format.

{
  "name": OPERATION_ID
}

You can then get information about the operation by running the following command.

gcloud ml speech operations describe OPERATION_ID

You can also poll the operation until it completes by running the following command.

gcloud ml speech operations wait OPERATION_ID

After the operation completes, the operation returns a transcript of the audio in JSON format.

{
  "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.9840146,
          "transcript": "how old is the Brooklyn Bridge"
        }
      ]
    }
  ]
}

Go

Java

Node.js

Python

To learn how to install and use the client library for Speech-to-Text, see Use the Vertex AI SDK for Python.

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.

Speech-to-Text supports uploading your longrunning recognition results directly to a Cloud Storage bucket. If you implement this feature with Cloud Storage Triggers, Cloud Storage uploads can trigger notifications that call Cloud Functions and remove the need to poll Speech-to-Text for recognition results.

To have your results uploaded to a Cloud Storage bucket, provide the optional TranscriptOutputConfig output configuration in your longrunning recognition request.

Protocol

Refer to the longrunningrecognize API endpoint for complete details.

The following example shows how to send a POST request using curl, where the body of the request specifies the path to a Cloud Storage bucket. The results are uploaded to this location as a JSON file that stores SpeechRecognitionResult.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'config': {...},
  'output_config': {
     'gcs_uri':'gs://bucket/result-output-path.json'
  },
  'audio': {
    'uri': 'gs://bucket/audio-path'
  }
}" "https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize"



The LongRunningRecognizeResponse includes the path to the Cloud Storage bucket where the upload was attempted. If the upload was unsuccessful, an output error will be returned. If a file with the same name already exists, the upload writes the results to a new file with a timestamp as the suffix.

{
  ...
  "metadata": {
    ...
    "outputConfig": {...}
  },
  ...
  "response": {
    ...
    "results": [...],
    "outputConfig": {
      "gcs_uri":"gs://bucket/result-output-path"
    },
    "outputError": {...}
  }
}

Can Google Docs convert audio to text?

Activate Voice Typing in Google Docs. Begin transcribing your audio. Once you've chosen a language, click the microphone and start speaking. Voice Typing will transcribe whatever is coming through your computer microphone.

How do I transcribe recorded audio to text in Google Docs?

To use voice typing as a transcription tool: Open a new Google Doc. Select Tools > Voice typing. If the language you're using is not shown, click on the link above the microphone icon and choose your language.

Can I transcribe an audio file?

You can always go the old-school route of transcribing it yourself, which could take you hours. Or you can opt to use a transcription service to convert audio files to text. There are plenty of free or low-cost options to choose from, and most of them work in minutes.

Can Google Docs convert mp3 to text?

Google Docs doesn't officially have a transcribing function. If you aren't looking for a high-quality transcription, you can try to use the voice typing feature to convert an audio file into text by following the steps below: Open Google Docs and select the “Tools” menu. Click “Voice typing.”