Salesforce Google Speech-to-Text Integration

Introduction

Generate Transcript Document

Transforming audio recordings from meetings into searchable, shareable text is revolutionary in the fast-paced business world of today. You can easily automate the laborious and error-prone process of manual transcription with Google's Speech-to-Text API and Salesforce. This guide, which includes code samples and best practices, takes you step-by-step through the process of putting up an end-to-end solution for creating transcripts directly within Salesforce.

Why Automate Transcripts?

Save Time: Instantly convert audio to text instead of manual note-taking.
Improve Accuracy: Leverage Google’s advanced AI for precise transcriptions.
Centralize Data: Store transcripts alongside Salesforce records for easy access.

Prerequisites

A Salesforce org with Custom Settings enabled.
A Google Cloud account with the Speech-to-Text API activated.
Basic familiarity with Apex and Salesforce configuration.

Step 1: Configure Google Speech-to-Text API

Enable the API in Google Cloud:
- Navigate to the Google Cloud Console.
- Enable the Speech-to-Text API under APIs & Services.
Generate an API Key:
- Go to Credentials > Create Credentials > API Key.
- Copy the key—it will be stored securely in Salesforce later.

Step 2: Set Up Custom Settings in Salesforce

Store your Google API key and endpoint URL securely using Salesforce Custom Settings:

Add URL to Remote
- Got to Setup, then Remote Site Setting
- Create New Remote site
  Remote Site Name: Google API
  Remote Site URL: https://speech.googleapis.com
Create a Custom Setting named API_Setting__c with two text fields:
- Google_API_Key__c
  (value: YOUR-AIP-KEY)
- Google_speech_to_text_URL__c
  (value: https://speech.googleapis.com/v1/speech:recognize)
Insert a record with your API key and URL.

Step 3: Upload Audio Files to Salesforce

Follow the user manual steps to:

Create an Account
Upload a .wav audio file (under 1 minute) via the Files related list.

Step 4: Integrate Google’s API Using Apex

The Apex class handles the integration. Here’s a breakdown of key components:

1. Fetching and Encoding the Audio File
The getBase64EncodedFile method retrieves the latest version of the uploaded file and encodes it to Base64:

Fetching and Encoding the Audio File

2. Calling Google’s API
The transcribeAudio method constructs the API request with language and encoding settings:

Calling Google’s API

3. Parsing the Response
The extractTranscript method processes Google’s JSON response and concatenates the transcript:

Parsing the Response

4. Store the Transcription
Store the generated transcript

Error Handling & Best Practices

Security: Use with sharing in the Apex class to enforce user permissions.
Validation: Check audio file size and format before upload.
Error Logging: The try-catch blocks ensure exceptions are logged in Salesforce debug logs.
Rate Limits: Google’s API has quotas—monitor usage in the Cloud Console.

Sample API Request & Response

Sample API Request & Response

Challenges & Resolutions
Speech-to-Text Methods Comparison in GCSTT

Challenges and Resolutions

Conclusion

By integrating Google’s Speech-to-Text API with Salesforce, you can automate transcript generation, streamline workflows, and ensure critical meeting insights are never lost. The provided Apex code and setup steps offer a scalable solution that’s secure and easy to maintain. Ready to get started? Follow the steps above, and transform your audio files into actionable text today! For advanced configurations or troubleshooting, refer to the Google Speech-to-Text documentation.

References

https://cloud.google.com/speech-to-text/docs/transcribe-api%20How%20to%20use%20Cloud%20Speech-to-Text%20Google%20Speech-to-Text%20documentation

https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_callouts_remote_site_settings.htm

For additional questions on Experience please reach out to support@astreait.com

Transform Meeting Audio into Actionable Text with Salesforce Integration