X

API Developer

Top 10 Gotchas

Poor audio capture
- Think through how your users will dictate to your application, and what device they will need/use for input.
- For instance, if you are developing a web-based application that will be running on a laptop or desktop, we would advocate the use of USB headsets for optimal audio quality (built in array mics have a tendency to pick-up background noise).

Audio chunk size too large or too small – make sure that the audio chunk size is set to 300 milliseconds or just slightly higher.

Sending incorrect audio format to our servers
- nVoq dictation servers will only accept:
  - .ogg Opus (WebM) 16-bit, mono (0.5 Quality or higher)
  - .ogg vorbis 16-bit, mono (0.5 Quality or higher)
  - PCM-WAV 16-bit, 16-kHz, mono
- Keep in mind, that if you are sourcing audio from a browser or smartphone, it will most likely be recorded at 44kHz. This means you will need to downsample the audio to 16 kHz before sending to our platform.

Submitting audio with incorrect nVoq.SPS credentials (i.e., incorrect Account Login or Environment)
- Make sure you are using proper nVoq.SPS credentials when sending audio to the platform.
- Your nVoq.SPS credentials will be provided to you by your nVoq Partner Success Manager (ISV).

Submitting nVoq.SPS account lacks appropriate authorization – make sure that the account you are submitting audio with has a Dictation role. This can be verified in the Administration Console.

Lack of error handling (e.g., no resubmission on failed socket open)
- Include appropriate error handling in your application to gracefully handle situations when a resource is unavailable.
- Also, if you find a resource is unavailable, resubmit the audio a specific amount of times before erroring out.

Improper WebSocket closure – Ensure that you close all open WebSockets once your dictation is complete. This will reduce the chance of exceeding the maximum amount of open WebSockets per user account (which is currently set to 3).

Improper usage of audio/text mark-up (replacement of Hypothesis w Stable text) – If you display Hypothesis text within your application, make sure that you use our Text with Mark-up feature so that you can easily identify the beginning and end of the Hypothesis text stream, when replacing it with Stable text.

Buffer audio collection while the WebSocket is opening – WebSockets may take a few milliseconds to setup – so buffer any audio that is spoken by the end user while the WebSocket is opening – then stream the buffered audio once the WebSocket is fully established/open.

And don’t forget to unmute your mic before testing your application!

#!/bin/bash
#TBD

2024 nVōq Inc. All rights reserved. Privacy Policy