Google Cloud: Converting Speech to Text using Google Cloud Speech API

Share At:

GoogleCloudSTT - Speech to Text | Project Naomi


The Google Cloud Speech API enables easy integration of Google speech recognition technologies into developer applications. The Speech API allows you to send audio and receive a text transcription from the service.

What will we will learn in this Lab?

We will learn below topics in this lab:

  • Create an API key
  • Create a Speech API request
  • Call the Speech API request

Create an API Key

Since you’ll be using curl to send a request to the Speech API, you’ll need to generate an API key to pass in our request URL.

To create an API key :

  1. Click Navigation menu > APIs & services > Credentials:

2. Then click Create credentials:

3. In the drop down menu, select API key:

4. Copy the key you just generated and click Close.

Now that you have an API key, you will save it as an environment variable to avoid having to insert the value of your API key in each request.

In order to perform next steps please connect to the instance provisioned for you via ssh.

5. Open the Navigation menu and select Compute Engine. You should see the following provisioned linux instance:

6. Click on the SSH button. You will be brought to an interactive shell.

In the command line, enter in the following, replacing <YOUR_API_KEY> with the key you just copied:


Remain in this SSH session for the rest of the lab.

Create your Speech API request

We will use a pre-recorded file that’s available on Cloud Storage: gs://cloud-samples-tests/speech/brooklyn.flac. You can listen to this file before sending it to the Speech API here.

Create request.json in SSH command line. You’ll use this to build your request to the speech API:.

touch request.json

Now open the request.json using your preferred command line editor .

Add the following to your request.json file, using the uri value of the sample raw audio file:

"config": {
"languageCode": "en-US"
"audio": {

The request body has a config and audio object.

In config, you tell the Speech API how to process the request:

  • The encoding parameter tells the API which type of audio encoding you’re using while the file is being sent to the API. FLAC is the encoding type for .raw files (here is documentation for encoding types for more details).

There are other parameters you can add to your config object, but encoding is the only required one.

In the audio object, you pass the API the uri of the audio file in Cloud Storage.

Now you’re ready to call the Speech API!

Call the Speech API

Pass your request body, along with the API key environment variable, to the Speech API with the following curl command (all in one single command line):

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json \

Your response should look something like this:

"results": [
"alternatives": [
"transcript": "how old is the Brooklyn Bridge",
"confidence": 0.98267895

You’ll notice that you called the syncrecognize method in the request above. The Speech API supports both synchronous and asynchronous speech to text transcription. In this example you sent it a complete audio file, but you can also use the syncrecognize method to perform streaming speech to text transcription while the user is still speaking.

you created an Speech API request then called the Speech API. Run the following command to save the response in a result.json file:

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json \
"${API_KEY}" > result.json


This concludes our lab. We integrated speech recognition into an app, and then generated transcription from the service.

Happy Learning !!!

Share At:
0 0 votes
Article Rating
Notify of
1 Comment
Oldest Most Voted
Inline Feedbacks
View all comments
Kayıt Ol
18 days ago

Your article made me suddenly realize that I am writing a thesis on After reading your article, I have a different way of thinking, thank you. However, I still have some doubts, can you help me? Thanks.

Back To Top

Contact Us