This How-To describes how to perform basic dictation operations with the nVoq.API.
It is a great place to start if you are new to our API, speech recognition, or just
want to brush up on your web services skills before diving in to integrate your
application with our services.
The sections below walk you through the steps required to post audio to our system and
convert that audio to text in a batch operation - uploading the audio completely prior
to transcription. This is a common use case for asynchronous or "batch" dictation
where medical professionals record their audio and submit it for processing when the
entire recording is complete. The transcribed text is then inserted into the medical record
system when the audio is fully transcribed.
Before You Begin
API User Account
If your organization has not already been in contact with our Sales team, please complete this short form on the Developer Registration Page
and we will reach out to you regarding a user account and development with our APIs.
Once you have an account, you must change your password before the account can be used for API calls.
Audio Format
The nVoq API supports ogg Vorbis, WebM, MPEG-4 and PCM encoded audio sampled at 8kHz or 16kHz. 16kHz will produce better
accuracy and is highly recommended over 8kHz.
The nVoq API is a RESTful web services API and therefore does not constrain you to any specific platform or
programming language. We provide sample code below for shell scripting (bash), C#, and Java.
Follow along and run this code in your environment. But, if you prefer C++, Go, or some other language,
that's great! Just adapt the code below to your language's web services functionality and you should be good to go.
Let's Go!
Choose your programming language...
Step 1: Set Up
To make the examples easier to read, long URL's and parameters are replaced with more legible variables.
This section of code defines those variables and sets the values needed to execute the code in the sections that follow.
#!/bin/bash
#server info
serverInfo="test.nvoq.com"
#credentials
user="yourUserName"
password="yourPassword"
# What type of audio to upload
# contentType can be audio/x-wav or audio/ogg
# audioFormat must be pcm-16kHz, pcm-8kHz, ogg, webm, or mp4
contentType="audio/x-wav"
audioFormat="pcm-16khz"
audioFile="myAudioFile.wav"
public class Program{
// Path to 16-bit 16-kHz Mono PCM WAV File, or 16-kHz Ogg File
private String myAudioFilePath = "./";
// Your username
private String username = "";
// Your password
private String password = "";
//
// To use an API key in your software
// private String apiKey = "eyJ0eXA ... iOiJKV1QiLCJh";
//
// Server URL
private String baseUrl = "https://test.nvoq.com";
// "audio/ogg" when using Ogg format
// "audio/x-wav" when using WAV format
private String audioContentType = "audio/x-wav";
// "pcm-16khz" when using WAV format
// "ogg" when using Ogg format
private String audioFormat = "pcm-16khz"; // use "ogg" when using Ogg format, "webm" when using WebM, or "mp4" when usng MPEG-4
}
public class JavaExample{
public JavaExample(){
//some code here
String strTest = new String();
}
}
allYourIndentation = areBelongToUs
while 0 == zero:
print 'I like python'
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Text;
using System.Threading;
namespace nVoqHttpApiCSharp
{
/*** This class will eventually contain all the code for transcribing
audio. We will fill out the methods in the sections that follow
starting with the configuration settings below. ***/
class Program
{
/**** Begin configuration settings ****/
// Path to 16-bit 16-kHz Mono PCM WAV File, or 16-kHz Ogg File
const string AudioFilePath = @""; // e.g. C:\audio.wav or C:\audio.ogg
// Your username
const string Username = "";
// Your password
const string Password = "";
// To use an API key in your software
// const string apiKey = "eyJ0eXA ... iOiJKV1QiLCJh";
//
// Server URL
const string BaseUrl = "https://test.nvoq.com";
// "audio/ogg" when using Ogg format
// "audio/x-wav" when using WAV format
const string AudioContentType = "audio/x-wav";
// "pcm-16khz" when using WAV format
// "ogg" when using Ogg format
const string AudioFormat = "pcm-16khz"; // use "ogg" when using Ogg format, "webm" when using WebM, or "mp4" when usng MPEG-4
/**** End configuration settings ****/
/**** More code will go here ****/
}
}
Step 2: Upload Audio
As you might have guessed, audio must be provided to the dictation service in order to convert it to text.
For this batch processing example,
the entire audio file is uploaded prior to starting transcription. This section outlines how to use the nVoq.API to upload an audio file. If you don't have an audio file readily available,
you can download one here.
# Using the test parameters defined above,
# upload the audio. This returns a URL for accessing the audio.
audioLocation=$(curl -v -X POST -u ${user}:${password} --header \
"Content-Type:${contentType}" --data-binary "@${audioFile}"\
https://${serverInfo}/SCFileserver/audio
public class JavaExample{
public JavaExample(){
//some code here
String strTest = new String();
}
}
allYourIndentation = areBelongToUs
while 0 == zero:
print 'I like python'
/*** The following method uploads audio to the nVoq
servers from a file as specied in the method parameter 'audioFile'.
Add this method to your Program class. ***/
private static string UploadAudio(string audioFile)
{
HttpWebRequest request = BuildRequest("POST", BaseUrl, "/SCFileserver/audio");
if (!File.Exists(audioFile))
throw new Exception("Could not locate audio file: " + audioFile);
byte[] audioBytes = File.ReadAllBytes(audioFile);
request.ContentType = AudioContentType;
using (Stream requestStream = request.GetRequestStream())
requestStream.Write(audioBytes, 0, audioBytes.Length);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode != HttpStatusCode.Created)
throw new Exception("Unexpected HTTP response: " + response.StatusCode);
return response.Headers.Get("Location");
}
Step 3: Start Dictation
This section outlines how to use the nVoq.API to start and perform a dictation, converting the audio you uploaded in the previous section
into text.
#Using the test parameters defined above, create the dictation to begin the transcription process
dictationLocation=$(curl -v -X POST -u ${user}:${password} -d \
"streaming=false&profile=${user}&jobDelayInMillis=100&\
audio-url=${audioLocation}&audio-format=${audioFormat}&priority=42&\
audio-url-complete=true&client-observer-id=${RANDOM}"\
https://${serverInfo}/SCDictation/rest/nvoq/factory/dictations/ 2>&1 | \
collect_location)
public class JavaExample{
public JavaExample(){
//some code here
String strTest = new String();
}
}
allYourIndentation = areBelongToUs
while 0 == zero:
print 'I like python'
/*** This method starts the dictation, using the
audio location from the audio upload operation above. The Dictation Factory
returns a reference to the dictation as a 'Location' which is a unique
URL that allows acccess to this dictation.
Add this method to your Program class. ***/
private static string StartDictation(String audioLocation)
{
HttpWebRequest request = BuildRequest("POST", BaseUrl,
"/SCDictation/rest/sayit/factory/dictations");
Dictionary form = new Dictionary();
form["profile"] = Username;
form["audio-url"] = audioLocation;
form["audio-format"] = AudioFormat;
form["client-observer-id"] = "owner";
form["client-observer-expire-seconds"] = "3600";
request.ContentType = "application/x-www-form-urlencoded";
byte[] formBytes = Encoding.ASCII.GetBytes(ToXwwwFormUrlEncoded(form));
using (Stream requestStream = request.GetRequestStream())
requestStream.Write(formBytes, 0, formBytes.Length);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode != HttpStatusCode.Created)
throw new Exception("Unexpected HTTP response: " + response.StatusCode);
return response.Headers.Get("Location");
}
Step 4: Get Results
nVoq's dictation platform provides high performance transcription, delivering text back to users in near real-time.
But, with fast networks, audio can typically be uploaded much faster than real-time. So, in this batch transcription example,
once we have uploaded the audio, we need to check back to see if the transcription is finished. This should be performed on reasonable
time intervals (i.e. every few seconds) until the dictation is complete. At this time, the client application can retrieve the text
that was generated by the dictation engine.
# The dictation operation is now started. Using its location
# we check for the status to be 'completed'.
completed="false"
while [ "${completed}" == "false" ]; do
completed=$(curl -s -u ${user}:${password} ${dictationLocation}/done)
sleep 2
done
# status is completed, get the dictation text
curl -u ${user}:${password} ${dictationLocation}/text
public class JavaExample{
public JavaExample(){
//some code here
String strTest = new String();
}
}
allYourIndentation = areBelongToUs
while 0 == zero:
print 'I like python'
/*** This method checks to see if the dictation is done
using the dictation location. It returns a boolean indicating if the dictation
is complete. Add this method to your Program class. ***/
private static bool IsDictationDone(string dictationLocation)
{
HttpWebRequest request = BuildRequest("GET", dictationLocation, "/done");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode != HttpStatusCode.OK)
throw new Exception("Unexpected HTTP response: " + response.StatusCode);
return Boolean.Parse(ReadStreamAsUtf8String(response.GetResponseStream()));
}
/*** This method retrieves the transcribed text using the dictation location. This
method can be called after the dictation status is complete. Add this method
to your Program class. ***/
private static string GetDictationText(string dictationLocation)
{
HttpWebRequest request = BuildRequest("GET", dictationLocation, "/text");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode != HttpStatusCode.OK)
throw new Exception("Unexpected HTTP response: " + response.StatusCode);
return ReadStreamAsUtf8String(response.GetResponseStream());
}
Full Sample Code
Below is the full sample code. Copy and paste the entire contents of the code below into your favorite editor and
save locally on your machine. Modify the URL's and username/password according to your credentials and
system access. Then, run the program and enjoy all the excitement of securely converting audio to text via
the nVoq.API platform.
public class JavaExample{
public JavaExample(){
//some code here
String strTest = new String();
}
}
allYourIndentation = areBelongToUs
while 0 == zero:
print 'I like python'
If you have any questions, please contact us at support@nvoq.com.