Robot

Speech-to-text AI Examples

Use OpenAI Whisper API from Huggingface Inference Endpoints to transcribe speech to text.

Generative AI models offer capabilities for transcribing audio to text at an incredibly great accuracy, stepping in to field where traditionally niche players offered pricey services. Medical and legal fields are in a good position to leverage these new capabilities.

This example shows two simple ways of transcribing an audio file to text:

Using Whisper model from Huggingface Inference Endpoints, allowing the AWS PrivateLink deployments
Using public API from OpenAI

Setup

The example provides a small flac and m4a source file, and uses Robocorp Control Room's Vault for storing the access credentials. These are the names of required Vaults and keys for each use case:

Huggingface Inference Endpoints
- Vault named Huggingface
- Key named whisper-url that has the URL of a deployed inference endpoint (which you need to create)
- Key named api-token has the API token from your HF account
OpenAI
- Vault named OpenAI
- Key named key that has the API key
Ideas for further development

Both of the examples are covering only a basic use case, and can be expanded. Feel free to leave a PR! Here are some ideas:

Handle inputs over 25 MB
Improve reliability by introducing correct spelling in prompt
Expand file format support
Add exception handling

Technical information

Last updated

16 August 2023

License

Apache License 2.0

Dependencies

openai

Speech-to-text AI Examples

Setup

Ideas for further development

Technical information