Matching

Introduction

Matching is a more simple operation than continuous speech recognition. Rather than streaming audio and receiving text or uploading large audio files for batch transcription, matching is designed to identify what a user said in the context of a more limited set of choices. This constrained vocabulary is useful for cases when an application wants to capture a command from a user where that command is one of a relatively small number of choices. For example, if a medical record system has five navigation tabs, then the acceptable commands might be "open tab one", "open tab two", and so on. This constrained set of choices allows for programming of responses to discrete audio input from the user.

Before You Begin

API User Account

If your organization has not already been in contact with our Sales team, please complete this short form on the Developer Registration Page and we will reach out to you regarding a user account and development with our APIs.

Once you have an account, you must change your password before the account can be used for API calls.

Audio Format

The nVoq Matching API supports G711 (muLaw) audio with the following properties:

Sample Rate: 8000 Hz
Sample Size: 8 bit
Channels: 1 (mono)
Frame Size: 1
Frame Rate: 8000
Big Endian: False (no endianness)
Signed / Unsigned: Signed
HTTP Format Alias: uLaw
HTTP Content-Type: audio/x-wav

Start your IDE

The nVoq API is a RESTful Web Services and WebSocket API and therefore does not constrain you to any specific platform or programming language. We provide sample code below for shell scripting (bash), C#, Java, and JavaScript. Follow along and run this code in your environment. But, if you prefer C++, Go, or some other language, that's great! Just adapt the code below to your language's web services functionality and you should be good to go.

Let's Go!

Choose your programming language...

Step 1: Create Grammar

In order to define the matching choices, a list of those choices must be uploaded to the server. This can be accomplished in two different ways. One can upload a list of words. Or, if the list of possible spoken phrases to be recognized is more complex, an XML based grammar can be uploaded as well. In either case, the end result is a grammar location reference for performing matching operations. If you upload a file, it must be in Unix format (just line feed, not line feed and carriage return).

#!/bin/bash

#Helper function to collect results from
#the location header in the HTTP response
collect_location()
{
  sed -n -e 's/^.*Location: //p' | tr -d ' \r'
}

#server info
serverInfo="test.nvoq.com"
  
#credentials
user="yourUserName"
password="yourPassword"

#Content type must be:
# - "text/plain" for word list
# - "text/xml" for speech xml grammar


# Uupload the list of words --
# the server creates a grammar from this list and returns the URL
grammar_url=$(curl -v -u ${user}:${password} -X POST \
--header "Content-Type:text/plain" --data-binary "@matchingwords.txt" \
https://${serverInfo}/scgrammar/NUANCE?recognizerLocale=en-US \
2>&1 | collect_location)


####################################################
# mathingwords.txt contains the following lines
#
# red
# green
# blue
# yellow
# orange
#
####################################################

import java.io.*;
import java.net.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Paths;
import java.nio.file.Files;
import java.util.*;

public class Program{
   
    // Path to 8-bit 8-kHz Mono uLaw wave file
    private String myAudioFilePath = "./";
   
   
    // Your username
    private String username = "yourUsername";
    // Your password
    private String password = "yourPassword";
    // Server URL
    private String baseUrl = "https://test.nvoq.com";
    // "audio/ogg" when using Ogg format
    // "audio/x-wav" when using WAVE format
    private String audioContentType = "audio/x-wav";
    //text/plain for list of words
    //text/xml for grammar
    private String grammarContentType="text/plain";

    
    private String createGrammar(String grammarFileName) {

    //service creates a Nuance matching server compatible grammar
    String url = baseUrl + "/scgrammar/NUANCE";

    byte[] postData;
    try {
         postData = Files.readAllBytes(Paths.get(grammarFileName));

         URL myurl = new URL(url);
         HttpURLConnection con = (HttpURLConnection) myurl.openConnection();
         con.setDoOutput(true);
         String credentials = username + ":" + password;
         String basicAuth = "Basic " + 
            new String(Base64.getEncoder().encode(credentials.getBytes()));
         con.setRequestProperty ("Authorization", basicAuth);
         con.setRequestMethod("POST");
         con.setRequestProperty("Content-Type", grammarContentType);

         DataOutputStream wr = new DataOutputStream(con.getOutputStream());
         wr.write(postData);
         wr.flush();
         String location = con.getHeaderField("Location");

         return location;
      } catch (Exception e) {
         System.out.println(e.toString());
      }
      return "see error message above";
    }
    
//--------------------------------------------------
// matchingwords.txt contains the following lines
//
// red
// green
// blue
// yellow
// orange
//
//--------------------------------------------------


<!-- ===================================================== -->
<!-- Matching JavaScript How-To.  The script below         -->
<!-- performs a matching operation over HTTP.              -->
<!-- ===================================================== -->
<!-- REMEMBER TO CONSIDER THE IMPACT OF CORS               -->
<!-- You must disable it in your browser or                -->
<!-- contact your nVoq representative to have              -->
<!-- your domain added to the allowed list.                -->
<!-- ===================================================== -->

<html>
<meta charset="UTF-8">
<body>
   <!-- Simple audio file upload input -->
   <p>nVoq API Matching HowTo</p>
   
   <p>Choose the grammar file to upload.</p>
   <input type="file" id="grammarFileInput" />
   <br/>
   <label id="grammarURLLabel">Grammar URL: --</label>
   <br/>
   <p>Choose the audio file to upload.</p>
   <input type="file" id="audioFileInput" />
   <br />
   <label id="audioURLLabel">Audio URL: --</label>
   <br />
   <br/>
   <input type="button" id="match" value="Perform Match"/>
   <br/>
   
   <textarea rows="25" cols="75" id="results">Matching results will appear here</textarea>

   <script>

   var text = "";
   var status = "WORKING";
   var connected = false;
   var username = "yourUsername"
   var password = "yourPassword"
   
   var audioURL = "";
   var grammarURL = "";

   function readGrammarFile(evt) {
      //Retrieve the first (and only) File from the FileList object
      var f = evt.target.files[0];

      if (f) {
         var r = new FileReader();
         r.onload = function(e) {
            var authString = "Basic " + btoa(username + ":" + password);            
            var xhr = new XMLHttpRequest();
            xhr.open('POST', "https://test.nvoq.com/scgrammar/NUANCE", true);
            xhr.setRequestHeader("Content-Type","text/plain");
            xhr.setRequestHeader("Authorization",authString);
            xhr.onreadystatechange = processRequest;
            xhr.send(r.result);
            function processRequest(message) {
                  grammarURL = xhr.getResponseHeader("Location");
                  document.getElementById('grammarURLLabel').innerHTML = "Grammar URL: " + grammarURL;
            }
         }
         r.readAsText(f);
      } else {
         alert("Failed to load file");
      }

   }

   //...

   document.getElementById('grammarFileInput').addEventListener('change',
            readGrammarFile, false);
            
   //...
//--------------------------------------------------
// matchingwords.txt contains the following lines
//
// red
// green
// blue
// yellow
// orange
//
//--------------------------------------------------
</script>

using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Text;
using System.Threading;

namespace nVoqHttpApiCSharp
{
    class Program
    {
        /**** Begin configuration settings ****/

        // Path to 8-bit 8-kHz Mono uLaw File
        const string AudioFilePath = @"c:\path\to\your\matchingaudio.wav";
        const string GrammarFilePath = @"c:\path\to\your\matchingwords.txt";
        // Your username
        const string Username = "yourUserName";
        // Your password
        const string Password = "yourPassword";
        // Server URL
        const string BaseUrl = "https://test.nvoq.com";
        const string AudioContentType = "audio/x-wav";
        
        //text/plain for list of words
        //text/xml for grammar
        const String GrammarContentType = "text/plain";

        private static string createGrammar(string grammarFile)
        {
            HttpWebRequest request = BuildRequest("POST", BaseUrl, "/scgrammar/NUANCE");
            if (!File.Exists(grammarFile))
                throw new Exception("Could not locate audio file: " + grammarFile);
            byte[] grammarBytes = File.ReadAllBytes(grammarFile);
            request.ContentType = GrammarContentType;
            using (Stream requestStream = request.GetRequestStream())
                requestStream.Write(grammarBytes, 0, grammarBytes.Length);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            if (response.StatusCode != HttpStatusCode.Created)
                throw new Exception("Unexpected HTTP response: " + response.StatusCode);
            return response.Headers.Get("Location");
        }
        
//--------------------------------------------------
// matchingwords.txt contains the following lines
//
// red
// green
// blue
// yellow
// orange
//
//--------------------------------------------------

Step 2: Upload Audio

Each implementation below uploads audio according to the platform specifics. If you don't have an audio file readily available, you can download one here.


#Upload the Audio --
#Server returns a refernce to the audio location
audio_url=$(curl -v -X POST -u ${user}:${password} \
  --header "Content-Type:audio/x-wav" \
  --data-binary "@matchingaudio.wav" https://${serverInfo}/SCFileserver/audio 2>&1 \
  | collect_location)

private String uploadAudio(String audioFileName) {

   String url = baseUrl + "/SCFileserver/audio";

   byte[] postData;
   try {
      postData = Files.readAllBytes(Paths.get(audioFileName));

      URL myurl = new URL(url);
      HttpURLConnection con = (HttpURLConnection) myurl.openConnection();
      con.setDoOutput(true);
      String credentials = username + ":" + password;
      String basicAuth = "Basic " + new String(Base64.getEncoder().encode(credentials.getBytes()));
      con.setRequestProperty ("Authorization", basicAuth);
      con.setRequestMethod("POST");
      con.setRequestProperty("Content-Type", audioContentType);

      DataOutputStream wr = new DataOutputStream(con.getOutputStream());
      wr.write(postData);
      wr.flush();
      String location = con.getHeaderField("Location");

      return location;
   } catch (Exception e) {
      System.out.println(e.toString());
   }
   return "see error message above";
}

<script>
     //...
   function readAudioFile(evt) {
         //Retrieve the first (and only) File from the FileList object
         var f = evt.target.files[0];

         if (f) {
            var r = new FileReader();
            r.onload = function(e) {               
               var authString = "Basic " + btoa(username + ":" + password); 
               var xhr = new XMLHttpRequest();
               xhr.open('POST', "https://test.nvoq.com/SCFileserver/audio", true);
               xhr.setRequestHeader("Content-Type","audio/x-wav");
               xhr.setRequestHeader("Authorization",authString);
               xhr.onreadystatechange = processRequest;
               xhr.send(r.result);
               function processRequest(message) {
                     audioURL = xhr.getResponseHeader("Location");
                     document.getElementById('audioURLLabel').innerHTML = "Audio URL: " + audioURL;
               }
            }
            r.readAsArrayBuffer(f);
         } else {
            alert("Failed to load file");
         }
   }
   document.getElementById('audioFileInput').addEventListener('change',
      readAudioFile, false);
   //...
   </script>

private static string UploadAudio(string audioFile)
{
    HttpWebRequest request = BuildRequest("POST", BaseUrl, "/SCFileserver/audio");
    if (!File.Exists(audioFile))
        throw new Exception("Could not locate audio file: " + audioFile);
    byte[] audioBytes = File.ReadAllBytes(audioFile);
    request.ContentType = AudioContentType;
    using (Stream requestStream = request.GetRequestStream())
        requestStream.Write(audioBytes, 0, audioBytes.Length);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    if (response.StatusCode != HttpStatusCode.Created)
        throw new Exception("Unexpected HTTP response: " + response.StatusCode);
    return response.Headers.Get("Location");
}

Step 3: Execute Matching Operation

Now that the grammar and audio are available, ask the server to look for a match, as defined in the grammar, within the audio.


curl -v POST -u ${user}:${password}  \
 -d audio-url=${audio_url} \
 -d grammar-url=${grammar_url} \
 -d reco-type=NUANCE \
 -d n-best=1 \
 -d recognizerLocale=en-US \
 -d confidence-threshold=50 \
 -d external-id=42 \
 -d recognizerLocale=${locale} \
 https://${serverInfo}/matchingwebservice/form

private String performMatchingOperation(String grammarLocation, String audioLocation){

   String url = baseUrl + "/matchingwebservice/form";

   byte[] postData;
   try {
      URL myurl = new URL(url);
      HttpURLConnection con = (HttpURLConnection) myurl.openConnection();
      con.setDoOutput(true);
      String credentials = username + ":" + password;
      String basicAuth = "Basic " + new String(Base64.getEncoder().encode(credentials.getBytes()));
      con.setRequestProperty ("Authorization", basicAuth);
      con.setRequestMethod("POST");
      con.setRequestProperty("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
      Map arguments = new HashMap<>();
      arguments.put("streaming", "false");
      arguments.put("grammar-url", grammarLocation); // This is a fake password obviously
      arguments.put("audio-url",audioLocation);
      arguments.put("external-id", "42");
      arguments.put("reco-type", "NUANCE");
      arguments.put("recognizerLocale", "en-US");
      arguments.put("n-best", "1");
      StringJoiner sj = new StringJoiner("&");
      for(Map.Entry entry : arguments.entrySet())
          sj.add(URLEncoder.encode(entry.getKey(), "UTF-8") + "=" 
               + URLEncoder.encode(entry.getValue(), "UTF-8"));
      byte[] out = sj.toString().getBytes(StandardCharsets.UTF_8);
      int length = out.length;
      con.setFixedLengthStreamingMode(length);
      //http.connect();
      try(OutputStream os = con.getOutputStream()) {
          os.write(out);
      }
      String location = con.getHeaderField("Location");
      BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
      StringBuffer sb = new StringBuffer();
      String inputLine;
      while ((inputLine = in.readLine()) != null)
         sb.append(inputLine);
      in.close();
      return sb.toString();
   } catch (Exception e) {
      System.out.println(e.toString());
   }
   return "see error message above";
}

public static void main(String[] args){
   Program prog = new Program();
   String grammarLocation = prog.createGrammar("matchingwords.txt");
   System.out.println(grammarLocation);
   String audioLocation = prog.uploadAudio("matchingaudio.wav");
   System.out.println(audioLocation);
   String matchingOutput = prog.performMatchingOperation(grammarLocation, audioLocation);
   System.out.println(matchingOutput);
}

<script>
     function performMatchingOperation(event){
       var authString = "Basic " + btoa(username + ":" + password);
       
       var xhr = new XMLHttpRequest();
       xhr.open('POST', "https://test.nvoq.com/matchingwebservice/form", true);
       xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
       xhr.setRequestHeader("Authorization",authString);
       
       var params = "streaming=false&" + 
                    "grammar-url=" + encodeURIComponent(grammarURL) + "&" +
                    "audio-url=" + encodeURIComponent(audioURL) +"&" +
                    "external-id=42&" +
                    "recongizerLocale=en-US&" +
                    "confidence-threshold=50&" +
                    "reco-type=NUANCE&" +
                    "n-best=1"
                    
       xhr.onreadystatechange = processRequest;
       xhr.send(params);
       function processRequest(message) {
             console.log(xhr.responseText);
             document.getElementById('results').innerHTML = xhr.responseText;
       }

      
   }
   document.getElementById('match').addEventListener('click',
           performMatchingOperation, false);
   //...
</script>

private static string performMatch(String grammarLocation, String audioLocation)
{
    HttpWebRequest request = BuildRequest("POST", BaseUrl, "/matchingwebservice/form");
    Dictionary form = new Dictionary();
    form["streaming"] = "false";
    form["grammar-url"] = grammarLocation;
    form["audio-url"] = audioLocation;
    form["external-id"] = "42";
    form["reco-type"] = "NUANCE";
    form["recognizerLocale"] = "en-US";
    form["n-best"] = "1";
    request.ContentType = "application/x-www-form-urlencoded; charset=UTF-8";
    byte[] formBytes = Encoding.UTF8.GetBytes(ToXwwwFormUrlEncoded(form));
    using (Stream requestStream = request.GetRequestStream())
        requestStream.Write(formBytes, 0, formBytes.Length);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    if (response.StatusCode != HttpStatusCode.OK)
        throw new Exception("Unexpected HTTP response: " + response.StatusCode);
    return ReadStreamAsUtf8String(response.GetResponseStream());
}
static void Main(string[] args)
{
    // Use up-to-date HTTPS ciphers for PCI compliance
    ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;

    Console.WriteLine("Creating grammar...");
    string grammarLocation = createGrammar(GrammarFilePath);
    Console.WriteLine("Grammar created. Location: " + grammarLocation);

    Console.WriteLine("Uploading audio...");
    string audioLocation = UploadAudio(AudioFilePath);
    Console.WriteLine("Audio uploaded. Location: " + audioLocation);

    Console.WriteLine("Performing the match...");
    string dictationresults = performMatch(grammarLocation, audioLocation);
    Console.WriteLine("results: " + dictationresults);
}

Full Sample Code

Below is the full sample code. Copy and paste the entire contents of the code below into your favorite editor and save locally on your machine. Modify the URL's and username/password according to your credentials and system access. Then, run the program and enjoy all the excitement of securely converting audio to text via the nVoq.API platform.

If you have any questions, please reach out to support@nvoq.com.

API How-To