Try Live STT Models

Test Google STT v1 Standard Live

Upload your own audio file (not huge for now) and get an instant transcript from Google STT v1 Standard. No login required.

Try Google STT v1 Standard on your audio

Drop a file below. We have pre-selected Google STT v1 Standard for you.

Input Source

Click or drag audio file here

Supports MP3, M4A, WAV, OGG

Max file size: 100MB

Configuration

Active Providers

Google STT v1 Standard

Input Language

Processing Options

Normalize AudioAuto-convert to 16kHz WAV mono for best results

Advanced Options

Use Native Configuration Mode

Enables raw parameter access for Google STT v1 Standard. Disables universal options.

Speaker Diarization

Identifies different speakers in the audio and labels their speech.

Speaker Count Hint

Min Speakers

Max Speakers

Provides a hint for the minimum and maximum number of expected speakers to improve diarization accuracy.

Custom Vocabulary

Boosts the recognition probability of specific words or phrases, such as proper nouns or domain-specific terms. Provide one phrase per line.

Filter Profanity

Detects and masks profane words in the transcript.

Smart FormattingNot supported by all selected

Converts transcribed numbers, dates, and currency into a more readable format (e.g., "twenty dollars" becomes "$20").

Automatic Punctuation

Automatically inserts punctuation like periods, commas, and question marks into the transcript.

Note: Files and transcripts are not stored on our servers and are used only to complete your request. More features are coming.

Technical Specifications

Configurable Parameters

These universal options are mapped to provider-specific features.

languagestring

Language

Primary language of the audio

Capabilities

Diarization
Diarization_config
Profanity_filter
Punctuation
Word_boost

Native Configuration

These are the provider's native API parameters — shown exactly as exposed by the vendor.

encodingDefault: LINEAR16

The encoding of the audio data sent in the request.

sample_rate_hertzDefault: 16000

Sample rate in Hertz of the audio data sent.

audio_channel_countDefault: 1

The number of channels in the input audio data.

enable_separate_recognition_per_channel

This field must be set to true if you want to separately recognize each channel.

max_alternativesDefault: 1

Maximum number of recognition hypotheses to be returned.

profanity_filter

If set to true, the server will attempt to filter out profanities.

enable_word_time_offsetsDefault: true

If true, the top result includes a list of words and the start and end time offsets.

enable_automatic_punctuation

If true, adds punctuation to recognition result hypotheses.

enable_spoken_punctuation

If true, replaces spoken punctuation with the corresponding symbols (e.g. "how are you question mark" -> "how are you?").

enable_spoken_emojis

If true, replaces spoken emojis with the corresponding Unicode characters (e.g. "smiling face emoji" -> "🙂").

modelDefault: default

Which model to select for the given request. Select the model best suited to your domain.

use_enhanced

Set to true to use an enhanced model for speech recognition.

About Google STT v1 Standard

Google Cloud Speech-to-Text v1 standard model for general-purpose transcription

Pricing

A detailed pricing breakdown will be available here shortly. For now, please refer to the provider's official website.

View Google STT v1 Standard official documentation