Try Live STT Models
Compare all STTTest Google STT v1 Standard Live
Upload your own audio file (not huge for now) and get an instant transcript from Google STT v1 Standard. No login required.
Try Google STT v1 Standard on your audio
Drop a file below. We have pre-selected Google STT v1 Standard for you.
Input Source
Click or drag audio file here
Supports MP3, M4A, WAV, OGG
Max file size: 100MB
Configuration
Advanced Options
Enables raw parameter access for Google STT v1 Standard. Disables universal options.
Provides a hint for the minimum and maximum number of expected speakers to improve diarization accuracy.
Boosts the recognition probability of specific words or phrases, such as proper nouns or domain-specific terms. Provide one phrase per line.
Note: Files and transcripts are not stored on our servers and are used only to complete your request. More features are coming.
Technical Specifications
Configurable Parameters
These universal options are mapped to provider-specific features.
languagestringLanguage
Primary language of the audio
Capabilities
- Diarization
- Diarization_config
- Profanity_filter
- Punctuation
- Word_boost
Native Configuration
These are the provider's native API parameters — shown exactly as exposed by the vendor.
encodingDefault: LINEAR16The encoding of the audio data sent in the request.
sample_rate_hertzDefault: 16000Sample rate in Hertz of the audio data sent.
audio_channel_countDefault: 1The number of channels in the input audio data.
enable_separate_recognition_per_channelThis field must be set to true if you want to separately recognize each channel.
max_alternativesDefault: 1Maximum number of recognition hypotheses to be returned.
profanity_filterIf set to true, the server will attempt to filter out profanities.
enable_word_time_offsetsDefault: trueIf true, the top result includes a list of words and the start and end time offsets.
enable_automatic_punctuationIf true, adds punctuation to recognition result hypotheses.
enable_spoken_punctuationIf true, replaces spoken punctuation with the corresponding symbols (e.g. "how are you question mark" -> "how are you?").
enable_spoken_emojisIf true, replaces spoken emojis with the corresponding Unicode characters (e.g. "smiling face emoji" -> "🙂").
modelDefault: defaultWhich model to select for the given request. Select the model best suited to your domain.
use_enhancedSet to true to use an enhanced model for speech recognition.
About Google STT v1 Standard
Google Cloud Speech-to-Text v1 standard model for general-purpose transcription
Pricing
A detailed pricing breakdown will be available here shortly. For now, please refer to the provider's official website.