HTTP Dictation

Introduction

This How-To describes how to perform basic dictation operations with the nVoq.API. It is a great place to start if you are new to our API, speech recognition, or just want to brush up on your web services skills before diving in to integrate your application with our services. The sections below walk you through the steps required to post audio to our system and convert that audio to text in a batch operation - uploading the audio completely prior to transcription. This is a common use case for asynchronous or "batch" dictation where medical professionals record their audio and submit it for processing when the entire recording is complete. The transcribed text is then inserted into the medical record system when the audio is fully transcribed.

Before You Begin

API User Account

If your organization has not already been in contact with our Sales team, please complete this short form on the Developer Registration Page and we will reach out to you regarding a user account and development with our APIs.

Once you have an account, you must change your password before the account can be used for API calls.

Audio Format

The nVoq API supports ogg Vorbis, WebM, MPEG-4 and PCM encoded audio sampled at 8kHz or 16kHz. 16kHz will produce better accuracy and is highly recommended over 8kHz.

[More Information On Audio Formats]

Start your IDE

The nVoq API is a RESTful web services API and therefore does not constrain you to any specific platform or programming language. We provide sample code below for shell scripting (bash), C#, and Java. Follow along and run this code in your environment. But, if you prefer C++, Go, or some other language, that's great! Just adapt the code below to your language's web services functionality and you should be good to go.

Let's Go!

Choose your programming language...

Step 1: Set Up

To make the examples easier to read, long URL's and parameters are replaced with more legible variables. This section of code defines those variables and sets the values needed to execute the code in the sections that follow.

#!/bin/bash
#server info
serverInfo="test.nvoq.com"

#credentials
user="yourUserName"
password="yourPassword"
  
# What type of audio to upload
# contentType can be audio/x-wav or audio/ogg
# audioFormat must be pcm-16kHz, pcm-8kHz, ogg, webm, or mp4

contentType="audio/x-wav"
audioFormat="pcm-16khz"
audioFile="myAudioFile.wav"

public class Program{
	
    // Path to 16-bit 16-kHz Mono PCM WAV File, or 16-kHz Ogg File
    private String myAudioFilePath = "./";
    // Your username
    private String username = "";
    // Your password
    private String password = "";
    //
    // To use an API key in your software
    // private String apiKey = "eyJ0eXA ... iOiJKV1QiLCJh";
    //
    // Server URL
    private String baseUrl = "https://test.nvoq.com";
    // "audio/ogg" when using Ogg format
    // "audio/x-wav" when using WAV format
    private String audioContentType = "audio/x-wav";
    // "pcm-16khz" when using WAV format
    // "ogg" when using Ogg format
    private String audioFormat = "pcm-16khz"; // use "ogg" when using Ogg format, "webm" when using WebM, or "mp4" when usng MPEG-4
}

public class JavaExample{
   public JavaExample(){
      //some code here
      String strTest = new String();
   }
}

allYourIndentation = areBelongToUs
while 0 == zero:
  print 'I like python'

using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Text;
using System.Threading;

namespace nVoqHttpApiCSharp
{

	/*** This class will eventually contain all the code for transcribing
	     audio.  We will fill out the methods in the sections that follow
	     starting with the configuration settings below.  ***/
	     
    class Program
    {
        /**** Begin configuration settings ****/
        
        // Path to 16-bit 16-kHz Mono PCM WAV File, or 16-kHz Ogg File
        const string AudioFilePath = @""; // e.g. C:\audio.wav or C:\audio.ogg
        // Your username
        const string Username = "";
        // Your password
        const string Password = "";
        // To use an API key in your software
        // const string apiKey = "eyJ0eXA ... iOiJKV1QiLCJh";
        //
        // Server URL
        const string BaseUrl = "https://test.nvoq.com";
        // "audio/ogg" when using Ogg format
        // "audio/x-wav" when using WAV format
        const string AudioContentType = "audio/x-wav";
        // "pcm-16khz" when using WAV format
        // "ogg" when using Ogg format
        const string AudioFormat = "pcm-16khz"; // use "ogg" when using Ogg format, "webm" when using WebM, or "mp4" when usng MPEG-4
        
        /**** End configuration settings   ****/

        /**** More code will go here ****/
    }
}

Step 2: Upload Audio

As you might have guessed, audio must be provided to the dictation service in order to convert it to text. For this batch processing example, the entire audio file is uploaded prior to starting transcription. This section outlines how to use the nVoq.API to upload an audio file. If you don't have an audio file readily available, you can download one here.

# Using the test parameters defined above, 
# upload the audio.  This returns a URL for accessing the audio.

audioLocation=$(curl -v -X POST -u ${user}:${password} --header \
   "Content-Type:${contentType}" --data-binary "@${audioFile}"\
   https://${serverInfo}/SCFileserver/audio

					private String uploadAudio(String audioFileName) {
   String url = baseUrl + "/SCFileserver/audio";

   byte[] postData;
   try {
      postData = Files.readAllBytes(Paths.get(audioFileName));

      URL myurl = new URL(url);
      HttpURLConnection con = (HttpURLConnection) myurl.openConnection();
      con.setDoOutput(true);
      String credentials = username + ":" + password;
      String basicAuth = "Basic " + 
         new String(Base64.getEncoder().encode(credentials.getBytes()));
      con.setRequestProperty ("Authorization", basicAuth);
      //API Key would be used as follows:
      //String auth = "Bearer " + username + ":" + apiKey;
      //con.setRequestProperty ("Authorization", auth);
      con.setRequestMethod("POST");
      con.setRequestProperty("Content-Type", audioContentType);

      DataOutputStream wr = new DataOutputStream(con.getOutputStream());
      wr.write(postData);
      wr.flush();
      String location = con.getHeaderField("Location");

      return location;
   } catch (Exception e) {
      System.out.println(e.toString());
   }
   return "see error message above";
}

public class JavaExample{
   public JavaExample(){
      //some code here
      String strTest = new String();
   }
}

allYourIndentation = areBelongToUs
while 0 == zero:
  print 'I like python'

/*** The following method uploads audio to the nVoq
     servers from a file as specied in the method parameter 'audioFile'.
     Add this method to your Program class. ***/
     
private static string UploadAudio(string audioFile)
{
  HttpWebRequest request = BuildRequest("POST", BaseUrl, "/SCFileserver/audio");
    if (!File.Exists(audioFile))
        throw new Exception("Could not locate audio file: " + audioFile);
    byte[] audioBytes = File.ReadAllBytes(audioFile);
    request.ContentType = AudioContentType;
    using (Stream requestStream = request.GetRequestStream())
        requestStream.Write(audioBytes, 0, audioBytes.Length);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    if (response.StatusCode != HttpStatusCode.Created)
        throw new Exception("Unexpected HTTP response: " + response.StatusCode);
    return response.Headers.Get("Location");
}

Step 3: Start Dictation

This section outlines how to use the nVoq.API to start and perform a dictation, converting the audio you uploaded in the previous section into text.

#Using the test parameters defined above, create the dictation to begin the transcription process

dictationLocation=$(curl -v -X POST -u ${user}:${password} -d  \
   "streaming=false&profile=${user}&jobDelayInMillis=100&\
   audio-url=${audioLocation}&audio-format=${audioFormat}&priority=42&\
   audio-url-complete=true&client-observer-id=${RANDOM}"\
   https://${serverInfo}/SCDictation/rest/nvoq/factory/dictations/ 2>&1 | \
      collect_location)

private String startDictation(String audioLocation){
   String url = baseUrl + "/SCDictation/rest/nvoq/factory/dictations/";

   byte[] postData;
   try {
      URL myurl = new URL(url);
      HttpURLConnection con = (HttpURLConnection) myurl.openConnection();
      con.setDoOutput(true);
      String credentials = username + ":" + password;
      String basicAuth = "Basic " + 
         new String(Base64.getEncoder().encode(credentials.getBytes()));
      con.setRequestProperty ("Authorization", basicAuth);
      con.setRequestMethod("POST");
      con.setRequestProperty("Content-Type",
         "application/x-www-form-urlencoded; charset=UTF-8");
      Map arguments = new HashMap<>();
      arguments.put("streaming", "false");
      arguments.put("profile", username); 
      arguments.put("audio-url",audioLocation);
      arguments.put("audio-format", audioFormat);
      arguments.put("priority", "42");
      arguments.put("audio-url-complete", "true");
      arguments.put("client-observer-id", "someObserverID");
      StringJoiner sj = new StringJoiner("&");
      for(Map.Entry entry : arguments.entrySet())
            sj.add(URLEncoder.encode(entry.getKey(), "UTF-8") + "=" 
            + URLEncoder.encode(entry.getValue(), "UTF-8"));
      byte[] out = sj.toString().getBytes(StandardCharsets.UTF_8);
      int length = out.length;
      con.setFixedLengthStreamingMode(length);
      //http.connect();
      try(OutputStream os = con.getOutputStream()) {
         os.write(out);
      }
      String location = con.getHeaderField("Location");

      return location;
   } catch (Exception e) {
      System.out.println(e.toString());
   }
   return "see error message above";
}

public class JavaExample{
   public JavaExample(){
      //some code here
      String strTest = new String();
   }
}

allYourIndentation = areBelongToUs
while 0 == zero:
  print 'I like python'

/*** This method starts the dictation, using the
     audio location from the audio upload operation above.  The Dictation Factory
     returns a reference to the dictation as a 'Location' which is a unique
     URL that allows acccess to this dictation.
     Add this method to your Program class. ***/
     
private static string StartDictation(String audioLocation)
{
    HttpWebRequest request = BuildRequest("POST", BaseUrl, 
       "/SCDictation/rest/sayit/factory/dictations");
    Dictionary form = new Dictionary();
    form["profile"] = Username;
    form["audio-url"] = audioLocation;
    form["audio-format"] = AudioFormat;
    form["client-observer-id"] = "owner";
    form["client-observer-expire-seconds"] = "3600";
    request.ContentType = "application/x-www-form-urlencoded";
    byte[] formBytes = Encoding.ASCII.GetBytes(ToXwwwFormUrlEncoded(form));
    using (Stream requestStream = request.GetRequestStream())
        requestStream.Write(formBytes, 0, formBytes.Length);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    if (response.StatusCode != HttpStatusCode.Created)
        throw new Exception("Unexpected HTTP response: " + response.StatusCode);
    return response.Headers.Get("Location");
}

Step 4: Get Results

nVoq's dictation platform provides high performance transcription, delivering text back to users in near real-time. But, with fast networks, audio can typically be uploaded much faster than real-time. So, in this batch transcription example, once we have uploaded the audio, we need to check back to see if the transcription is finished. This should be performed on reasonable time intervals (i.e. every few seconds) until the dictation is complete. At this time, the client application can retrieve the text that was generated by the dictation engine.

# The dictation operation is now started.  Using its location
# we check for the status to be 'completed'.
  
completed="false"
while [ "${completed}" == "false" ]; do
    completed=$(curl -s -u ${user}:${password} ${dictationLocation}/done)
    sleep 2
done

# status is completed, get the dictation text

curl -u ${user}:${password} ${dictationLocation}/text

private boolean isDone(String aDictationLocation) {
   String url = aDictationLocation + "/done";

   byte[] postData;
   try {

      URL myurl = new URL(url);
      HttpURLConnection con = (HttpURLConnection) myurl.openConnection();
      con.setDoOutput(true);
      String credentials = username + ":" + password;
      String basicAuth = "Basic " + 
         new String(Base64.getEncoder().encode(credentials.getBytes()));
      con.setRequestProperty("Authorization", basicAuth);
      con.setRequestMethod("GET");
      BufferedReader in = new BufferedReader(
         new InputStreamReader(con.getInputStream()));
      StringBuffer sb = new StringBuffer();
      String inputLine;
      while ((inputLine = in.readLine()) != null)
        sb.append(inputLine);
        in.close();
        System.out.println(sb);

        if (sb.toString().contains("true")) {
            return true;
        } else {
            return false;
      }
   } catch (Exception e) {
      System.out.println(e.toString());
   }
   return false;
}

private String getText(String aDictationURL) {

   String url = aDictationURL + "/text";

   byte[] postData;
   try {
      URL myurl = new URL(url);
      HttpURLConnection con = (HttpURLConnection) myurl.openConnection();
      con.setDoOutput(true);
      String credentials = username + ":" + password;
      String basicAuth = "Basic " + 
         new String(Base64.getEncoder().encode(credentials.getBytes()));
      con.setRequestProperty("Authorization", basicAuth);
      con.setRequestMethod("GET");
      BufferedReader in = 
         new BufferedReader(new InputStreamReader(con.getInputStream()));
      StringBuffer sb = new StringBuffer();
      String inputLine;
      while ((inputLine = in.readLine()) != null)
         sb.append(inputLine);
         in.close();
         return sb.toString();
      } catch (Exception e) {
         System.out.println(e.toString());
   }

   return null;
}

public class JavaExample{
   public JavaExample(){
      //some code here
      String strTest = new String();
   }
}

allYourIndentation = areBelongToUs
while 0 == zero:
  print 'I like python'

/*** This method checks to see if the dictation is done
     using the dictation location.  It returns a boolean indicating if the dictation
     is complete.  Add this method to your Program class. ***/
  
private static bool IsDictationDone(string dictationLocation)
{
    HttpWebRequest request = BuildRequest("GET", dictationLocation, "/done");
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    if (response.StatusCode != HttpStatusCode.OK)
        throw new Exception("Unexpected HTTP response: " + response.StatusCode);
    return Boolean.Parse(ReadStreamAsUtf8String(response.GetResponseStream()));
}

/*** This method retrieves the transcribed text using the dictation location.  This
     method can be called after the dictation status is complete. Add this method
     to your Program class. ***/
     
private static string GetDictationText(string dictationLocation)
{
    HttpWebRequest request = BuildRequest("GET", dictationLocation, "/text");
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    if (response.StatusCode != HttpStatusCode.OK)
        throw new Exception("Unexpected HTTP response: " + response.StatusCode);
    return ReadStreamAsUtf8String(response.GetResponseStream());
}

Full Sample Code

Below is the full sample code. Copy and paste the entire contents of the code below into your favorite editor and save locally on your machine. Modify the URL's and username/password according to your credentials and system access. Then, run the program and enjoy all the excitement of securely converting audio to text via the nVoq.API platform.

public class JavaExample{
   public JavaExample(){
      //some code here
      String strTest = new String();
   }
}

allYourIndentation = areBelongToUs
while 0 == zero:
  print 'I like python'

If you have any questions, please contact us at support@nvoq.com.

API How-To