X

API Developer

Dictation Terms

Alternate Words This feature presents alternative words or phrases for sections of a dictated transcript that the dictation engine did not clearly understand. See an example of Alternate Words in Mobile Voice.
Alternate Words Sample
Buffer Size of Audio Data The 300ms recommendation is for the size of the WebSockets dictation audio chunk that the customer sends to our dictation server.   nVōq provides no setting or system configuration on our side (API or server) to determine the size of the audio chunk received, it’s solely on the ISV developer side.   If the customer sends an 8mb or 5kb audio chunks size, it will translate this dictation much slower than if they sent a 300ms (4500kb) audio chunk size.

This configuration is in the client application and not from nVoq.  Our recommended size is 300ms (or 4500kb) for the best dictation translation performance.
Chunking This is the process of reading in segments of audio data of of the sound card and send it to the server. It could be 100s to 1 second (or whatever makes sense)
Hypothesis Text Hypothesis Text and Stable Text explained
Inverse Text Normalization (ITN) This is the process of taking the text normalization (TN) and converting text to the expected text format and structure. A core speech recognizer produces a spoken-form token sequence which is converted to written form through a process called inverse text normalization (ITN). ITN includes formatting entities like numbers, dates, times, and addresses.
Stable Text Hypothesis Text and Stable Text explained
Word Marker Use word markers to match audio segments with words in order to make corrected transcripts or extract audio pieces with the associated text. For more information, view the word marker definition on the nVōq support site.

Dictation Related Terms

Command & Control This is voice navigation of the application. An example is a Radiologist saying "Load my CT Right Ankle template" and that would kick off an action on the desktop/within the app. Or they could say "Go To History of Present Illness Section" and the application focus would change to that field. This could accomplish this 1 of 2 ways. (1) use nVōq grammar based Matching Service (this usually requires another invocation method in the app, like a separate push button for commands) (2) use nVōq dictation services APIs and perform Keyword spotting for the Command Phrase, within their application.
HTTP vs WebSockets HTTP and WebSocket both are communication protocols used in client-server communication. HTTP is unidirectional where the client sends the request and the server sends the response. WebSocket is bidirectional, a full-duplex protocol that is used in the same scenario of client-server communication, unlike HTTP it starts from ws:// or wss://. It is a stateful protocol, which means the connection between client and server will keep alive until it is terminated by either party (client or server). Here are some key differnces between HTTP and WebSockets:
  1. HTTP is a uni-directional communicational protocol, whereas WebSockets is bi-directional.
  2. Whenever a request is made through HTTP, it creates a connection at the client(browser) and closes it once the response from the server is received. Whereas in WebSockets, it keeps the connection open until the state has died.
  3. Http should not be used, when you don’t want the connection to be opened for a long time, whereas WebSockets can be used in that case.
  4. Http works poorly with frequent kind of application, which overloads the server. Whereas WebSockets can be used with Chat, Trading, frequent updates kind of application, where a request is made very frequently.
  5. Http uses HTTP or https protocol for sending a request(like http://www.google.com), whereas WebSockets uses ws protocol( like ws://google.com).   The connection process is show in the below diagram:

  6. Web Dictation Process
Topic This is a key component of creating a transcript from a dictation. It will take the audio form, translate it to phonemes, translate the phonemes into a word, translate the word into a sentence, and finalize the transcript using the ITN process (dictionary, corpus, vocabulary, substitutions, and sentence modeling).
#!/bin/bash
#TBD

  


2024 nVōq Inc. All rights reserved. Privacy Policy