Transcription Services

Overview of Transcription Services within QMS

QMS releases from 7.3 onward include integration with the Nuance Transcription Engine (NTE) to provide transcription services for recordings. Recording profiles can be configured to automatically transcribe calls once the call has completed. This can be done for On Demand recordings as well. Recordings can be transcribed later through the client interface using the Transcribe Now feature.

NTE supports almost all languages currently supported by QMS and can be configured to auto-detect the language being used within the call. Like QMS, NTE supports highly available deployments and includes a load balancer to optimize server efficiency. This feature requires licensing both for QMS and Nuance to enable the functionality.

This document is intended to outline the general configuration and setup of transcription services within QMS. This document does not replace the “Using Nuance Transcription Engine” or other Nuance documentation that should be used to setup the NTE.

QMS with Transcription Topology

Figure 1: QMS with Nuance Transcription Engine

A standard QMS deployment includes at least one Call Recording Service, Data Service, and Media Processing Service. Some deployments may include multiple instances of each of these services. Raw recording files are stored locally on the CR Service and is processed by the MPS once the call is complete. The MPS may store the recording locally or to a remote storage location, depending on the configuration.

If transcription is indicated for the recording, the MPS will also create a .wav and a .dat file that will be used to communicate to the NTE. NTE will only accept .wav formatted audio for transcription. The MPS sends the transcription request to the NTE Front End Controller along with the recorded audio. NTE will notify MPS once the file is fully received. MPS will continue with other processing until it receives a callback notification from NTE that the transcription is complete.

The transcription service may take a significant amount of time to process. The most accurate transcription setting has a one to one processing ratio with the length of the call. So, a 10 minute call could take 10 minutes to transcribe.

Once NTE has finished the transcription MPS will receive a callback with the processed transcription data. At this time the MPS will replace the .wav and .dat files stored for the transcription with a .txt file that contains the transcription data. This data includes not only the call text, but also speaker information, timing of the phrases within the call, and accuracy or confidence data for the recognition.

Figure 2: Highly Available QMS with NTE

The Highly Available deployment of QMS adds additional complexity. The Call Recording Service and Data Service would be setup as nodes with each service running on separate server instances. SQL also needs to be setup with Merge Replication and the data storage would need to be mirrored in some fashion to ensure no data loss. The MPS pool would service the HA deployment with two or more MPS running in conjunction.

The MPS connects and interacts with the NTE in the same fashion as before. The NTE environment would also be setup in a highly available fashion. Please refer to Nuance documentation in how that is setup. In either the HA or non-HA case, QMS interfaces with a front-end service, so the configuration is the same.

Transcription Licensing

Transcription requires a Transcription license within QMS. Contact your EngHouse Interactive representative to get a new license key that includes the transcription licenses. Any user who is recording a call that needs to be transcribed will need a transcription license.

Figure 3: QMS Licensing

In addition to QMS licensing, a Nuance license server and appropriate Nuance Transcription Engine licenses are required as part of the NTE setup.

User Configuration

Once transcription licenses are available, the next step is to assign licenses to users that need transcription. In the user edit screen, the Transcription license should now be available under the Analytics heading. Checking the license allows recordings created by this user to be transcribed. You can also use the Multi-User Edit functionality and the Active Directory Import to select and create users and assign the Transcription licensing.

Figure 4: User Edit Licensing

Media Processing Service Configuration

To setup the NTE connection, navigate to the Services section of the client administration. All Media Processing Services that are connecting to NTE should be available in the services list.

Figure 5: Services List showing Pooled MPS

The NTE Transcription settings can be configured independently for each MPS, unless the MPS is part of an MPS pool. Pooled Media Processing Services need to function and operate the same across the pool, so changing a transcription setting within any of these pooled MPS configurations will propagate the changes to the rest of the pool.

To change settings, open the MPS Edit tab and there will be a section called Nuance Transcription Engine Settings. First, you will want to check the Enable Transcription checkbox. This will show the rest of the available fields. The next field is the NTE Host Address. This is the IP or Hostname of the NTE Front End Controller. All MP Services must have free access to the NTE Host Address. Firewall settings must be open for all ports indicated within the Nuance documentation. The NTE Host Port should also match the NTE configuration. This is defaulted to 8080 and this port should be available through the firewall.

With these settings, QMS is able to connect and send transcription requests to NTE for processing. The NTE Callback URL is the location that NTE will send back the transcription response once processing is completed. This field is a standard URL text and you can specify either http or https in this field. The host and port of the URL should be the host/port of the MPS server that is servicing this request. MPS will listen on this port waiting for NTE to send the transcription callback to it.

The final section is the available language selection. The list includes all supported languages within NTE, but the administrator should select only the languages that are being used. Never select all languages and try to rely upon language identification as NTE identification quality is impacted the more languages it needs to work from. Typical deployments should only select two or three available languages and this should match the NTE licenses and language packs that are installed.

Figure 6: QMS Media Processing Service Configuration

Next to the NTE Server List, are an Edit and Add button. Clicking either of these will pop up a new window where the NTE Server details can be configured. Specifically, this allows you to set the NTE Host Address and Port information to connect to the NTE Server. With these settings, QMS is able to connect and send transcription requests to NTE for processing. The NTE Callback URL is the location that NTE will send back the transcription response once processing is completed. This field is a standard URL text and you can specify either http or https in this field. The host and port of the URL should be the host/port of the MPS server that is servicing this request. MPS will listen on this port waiting for NTE to send the transcription callback to it. The final piece of information is the FlexNet License Administrator Host Address and Port that the NTE server is connected to. Providing this information allows QMS to connect to the FlexNet License Administrator to gather License alerts, such as when licenses are running low or close to expiration.

Figure 7: QMS Media Processing Service Configuration

Once NTE is properly configured and saved, the MPS should establish a connection to the NTE server and await transcription requests. The MPS log will indicate if the NTE has been successfully connected. In the case where MPS is setup in a pool, you will get the following dialog indicating that the changes will be propagated to all Media Processing Services within the pool. The administrator must confirm to submit the change. This will propagate the Enable/Disable Transcription, NTE Host Address, NTE Host Port, and the protocol and port settings within the NTE Callback URL. The local MPS hostname/IP Address will be substituted within the NTE Callback URL.

Figure 8: Pooled MPS Configuration Warning

Call Recording Profile Configuration

If calls should be transcribed automatically once the call completes, then transcription should be enabled within the appropriate Call Recording Profile. Within the Call Recording Profile, there is a Transcription Settings section. You will first need to Enable Transcription for this profile. Once enabled, you will see two other options. The first is the Transcription Operating Mode. This is an NTE setting that may have three values, Accurate, Fast, or Warp. The faster the setting, the lower the transcription accuracy will become. Accurate is obviously the most accurate, but the NTE will process the recording at 1x speed, meaning if the call is 10 minutes, the processing will take 10 minutes. The Fast setting is a little less accurate but runs at 3x speed, so the same recording will be done in 3 minutes, 33 seconds. The final mode is Warp, which runs at 10x speed, so the recording could be completed in 1 minute, but may contain more errors. This setting can be adjusted on a per Call Recording Profile basis, so you could setup some queues with better accuracy than others.

The final component is the language selection. This list will only contain languages that are checked above in the MPS Service Configuration. Select the language(s) that are used for this profile and save the profile.

Figure 9: Call Recording Profile Transcription Settings

When transcription is enabled within a Call Recording Profile, when a call starts for a user who is in that profile, part of the evaluation of whether the call is recorded will also indicate whether the call is transcribed. If transcription is enabled, the recording is marked for transcription. Once the call completes, QMS will process the recording to create the call recording media. It will also create the .wav media needed for transcription and send the request to the NTE Front End Controller.

On Demand Transcription Configuration

There is also the capability to automatically transcribe recordings that are done On Demand without a Call Recording Profile. The options for this are found within the General settings in the client administration. The General settings tab include a section called On Demand Transcription Settings that allows you to Enable Transcription and set the Transcription Operating Mode. These settings are the same as described in the Call Recording Profile configuration.

This section does not include a language selection however. This is because the language selection should be related to the agent on the call and not necessarily to the administrator who is initiating the On Demand call. So QMS will attempt to determine the appropriate language by checking if the agent is a member of any Call Recording Profile and if so, it will use the language settings from that profile. If it can’t determine a language in this way, it will attempt to identify the language using all languages selected for the deployment from the MPS configuration.

Figure 10: On Demand Transcription Settings

Transcription with High Availability

The High Availability Deployment Guide.pdf will contain more information on how to setup an HA QMS environment. There must be redundancy in all components to achieve HA. Both Call Recorders and Data Servers would be setup as nodes with a primary and secondary server for each. In addition, at least two Media Processing Services would be pooled and available to each of the nodes. It is possible to have more than one MPS pool that services different nodes, but there should always be more than one MPS in each pool. Figure 2 in the diagram above outlines how this works.

The Nuance_Transcription_Engine_Guide_4.1.pdf document also describes how to achieve HA through the Nuance environment. The Nuance Front End Controller acts as a single point of entrance for the QMS servers to send and receive transcription requests to Nuance. But behind that Front End Controller can actually be a number of Highly Available Nuance options.

Searching Transcriptions

After QMS and NTE are properly setup and configured, the system is ready to create transcriptions. Transcriptions that are automatically created through a call profile will not immediately appear on the search recordings page in the client. These will only be available after the transcription is complete, which will lag several minutes behind when the call is created. Once a transcription is completed, the icon will change from a normal call icon to a transcribed call icon.

Figure 11: Recording Search with Transcribed Calls

Transcriptions within the Recording Edit Tab

Once the transcription is completed, opening the recording will show both the call audio and the transcribed data. The transcribed data will appear just below the call audio. The Transcription Media window will display the data that is received from NTE. This will include an indication of who is speaking. Typically, NTE will assign a simple number to the speaker. In most cases, there will only be 2 speakers on a call, but for conferencing and such, there may be more than two. NTE uses voice biometrics to determine who is speaking. QMS assigns a color to each speaker to help differentiate who is talking.

NTE will also segment the conversation, splitting it into phrases. NTE will create a new segment when the speaker switches or if it detects a pause in the speaker. Each segment that NTE provides also includes a time stamp, so the administrator can see exactly when within the call the phrase occurs. This information along with the transcribed text will appear in the Transcription Media window.

In addition, there will be a sound icon in front of each phrase segment. Clicking this icon will initiate play of the call audio at that exact point in the conversation.

Figure 12: Recording Edit with Transcription Media

Speaker and Language Identification and Segmentation

The NTE provides QMS the speaker identification and the segmentation of the recording. The Nuance speech analyzer utilizes speech biometrics to determine what speaker is talking on the recording. As the conversation moves back and forth, the NTE will segment or split the conversation into phrases that can be transcribed and read. This segmentation does not always happen perfectly and it’s possible for Nuance to segment the same speaker into smaller phrase chunks instead of having the entire speaker’s phrase.

In addition, when multiple languages are available, the NTE is also doing a language identification prior to any analysis to determine what language is being spoken in the recording so that it can then analyze the recording with the correct language model.

Different languages may be segmented differently in the end transcription. We’ve found that Asian languages may segment the transcription into smaller chunks and even group everything spoken by one caller through the whole recording into one segment. These are issues with the Nuance language model and there may be parameters to adjust within the NTE framework that can help with those situations. This is out of the reach of the QMS system.

Transcription Media Window

The Transcription Media window includes the exact location of transcription file. NTE cannot provide 100% accuracy with the transcription data and accuracy will decrease with the faster operating mode settings, but in general, the transcription is often quite good. NTE does sometimes provide an error description and this will be displayed within the Transcription Media window.

Figure 13: Transcription ERROR

In cases where there is an error or if the call has not been transcribed, QMS gives the option to initiate a transcription using the Transcribe Now feature. This is found within the Recording Edit tab and hovering over the call audio. If the administrator is licensed and the call is eligible for transcription, the Transcribe Now button will be available.

Figure 14: Transcribe Now button

After clicking Transcribe Now, QMS will initiate a transcription request. This process includes converting the .opus or .mp3 call audio back into wav format and then sending the request off to NTE. Since the transcription process may take a significant amount of time, up to the length of the call, once the Transcribe Now request is initiated, you will see the following dialog, but the Transcription Media window will not be available until after the transcription is complete.

Figure 15: Transcribe Now Initiated Dialog

Also included on the Transcription Media window on the lower right is an icon with two windows that when click will pull the entire transcription out into a separate window. This allows for the administrator to see more of the transcription text at once.

Figure 16: Transcription Window

Transcriptions within Evaluations

The transcription data will also be displayed and available within the Recording Evaluation tab. The Transcription Media within the evaluation operates in the same manner as described within the Recording Edit tab.

Figure 17: Transcription Media in an Evaluation

Transcription Scenarios

The base assumption in the scenarios below is that the QMS and NTE systems have been correctly configured per the information listed above and are properly communicating with each other.

Scenario 1 – Standard Call Profile Transcription

An agent has a transcription and call recording license is part of a Call Recording profile that has Transcription enabled. The Operation mode is set to Fast and this is an English Language queue, so only English is selected. The Call Recording profile is set to record all calls, so when the call comes in, QMS immediately starts recording. The recording continues until a hang-up is detected. At this time, the MPS will begin processing the recording and convert it to an .mp3 or .opus file. In addition, it will recognize that this call should be transcribed, so it will also create a .wav and .dat file that will be sent to the transcription engine. The MPS will also connect to the NTE sending the file and information about the transcription request. NTE will respond that the files were received. At this time, we have not received a transcription, so the recording search will not show the transcription, but it will indicate that we are waiting for a transcription to complete. The transcription process will last in this case approximately 1/3 the length of the call. Once NTE completes the transcription, it will send the requesting MPS a callback and it will download the transcription file. The transcription is stored with the recording file as a text file and the transcription is accessible in the Recording’s edit page.

Scenario 2 – Standard OnDemand Transcription

In this case, the agent is not assigned to a call profile. Alternatively, they could be assigned to a profile that is set to record 0% or any percent where the current call is not recording. An administrator monitoring the call in the Real Time view that has On Demand recording license and permissions could start recording the agent’s call. Once the call has ended or the administrator stops the recording, MPS will process the recording file. It will also check the On Demand Transcription settings to determine if the call should be transcribed. In this case, On Demand Transcription is enabled on the General Settings tab and the Operating Mode is set to Warp. Language determination in this case becomes a bit more complex since the person initiating the recording is not the agent. QMS first checks to see if the agent is associated with a Call Recording Profile. If they are, those language settings will be used. Otherwise, QMS will use all deployment languages set on the MPS Service tab and require NTE to detect the language. Since the Operation Mode is Warp in this case, we should expect a response back in 1/10th of the length of the call.

Scenario 3 – Disabled Transcription

This scenario can happen in one of two ways. The first way is that the agent is associated with a Call Recording profile that has disabled Transcription. In this case, the recording will be created and processed normally, but there will be no transcription for that recording. This can also happen if the recording was created as an OnDemand recording and transcription is disabled in the General Settings tab. In this case, also, the recording is created but no transcription. This does not, necessarily, end the transcribability of that recording however as the TranscribeNow feature could be utilized at a future time.

Scenario 4 – TranscribeNow

The TranscribeNow feature allows an after the fact transcription of any eligible recording. The administrator who is initiating the TranscribeNow must have appropriate license and permissions, but if they do and the recording is no already transcribed, then the TranscribeNow button will be accessible in the recording wav player on the edit screen. Clicking TranscribeNow does several things. First, QMS tracks down the recording and recorder that made the recording and finds MPS associated with the recorder. If the recording is ultimately not available or there is no MPS available that is configured to connect with NTE, then the TranscribeNow will fail.

Assuming an MPS can be found, the recording must first be converted back to a .wav format that can be used to send to NTE and it also creates the .dat file needed. The algorithm used to determine Operating Mode and Language is similar to what’s mentioned before and it will follow this priority:

If the recording’s agent user is associated with a Call Recording Profile with Transcription Enabled, then the Operating Mode and Language setting from that profile will be used in the TranscribeNow request.
If the recording’s agent is not associated with a Call Recording Profile, the transcription will be treated like an on demand transcription and utilize the settings contained in the General Settings tab. If Transcription is Enabled, the Operating Mode will be what is selected on the settings page. The Language identification will use all languages selected on the MPS Server configuration and NTE will detect the correct language.
If both the Call Recording Profile and OnDemand Transcription settings are disabled, QMS will chose a default Operating Mode that will either be the last setting set on the General Settings tab before disabling transcription or a Default value, which is set to ‘Fast’. The Language identification will use all languages selected on the MPS Server configuration and NTE will detect the correct language.
If the MPS configuration has disabled Transcription or the NTE system is otherwise not configured properly or available, the TranscribeNow will fail.

Transcription and Retention

Transcription files will utilize the same setting that are identified for the recording. Completed transcriptions are stored as .txt files in the same locations as the recording files. If encryption is enabled, then these files will be encrypted. If a retention setting indicates to move or delete a recording file, the transcription files will also be deleted. Archived recordings will also archive transcription files.

Transcription License Alerts

The Nuance FlexNet License Administrator (FLA) tool that hosts the NTE licenses also provides a mechanism for threshold and other alerts when those licenses are about to expire or run out. Nuance provides two types of licenses for the NTE server. The first type are instance licenses that are tied to a specific NTE Host. If the NTE host is running a Front End Controller, English Transcriber, and a French Transcriber, this will require three instance licenses.

The second type of license are consumable licenses that expire either monthly or yearly. These consumable licenses are based on transcription minutes. The number of transcription minutes corresponds to the length of the source audio. So, transcribing a 7 minute call will require 7 minutes of consumable licenses, regardless of the operating mode that is set. Therefore, if you have 2000 consumable minutes, you will have 1993 left after the transcription. Consumable minutes can be used for any language or tied to a specific language such as English or French.

FLA includes an Administration tab with an Alert Configuration option. Within this page you can configure the thresholds and types of alerts that you want FLA to send. Nuance’s FLA only displays these alerts on the Dashboard display of FLA. But they also provide a SOAP interface to the alerts that QMS can consume.

Transcription and Retention

Figure 18: FLA Alert Configuration

QMS will poll FLA every hour to check for License Alerts. When a new alert is generated by FLA, QMS will pull in the data and generate an email notification that is sent to the notification destination configured within the MPS configuration. This email will have the subject “Nuance Transcription Engine Licensing Alert”. It will include the specific MPS id that sent the alert and each of the alerts sent. There could be multiple alerts sent.

Figure 19: Nuance Transcription Engine Licensing Alert

NTE Consumable License Usage Over Time Dashboard Widget

In the case where consumable licenses are used, QMS also provides a widget that can be added to the Dashboard screen that shows license usage over time. This is similar to the Storage Usage Over Time widget and will display data on an hourly resolution. The usage can be displayed by day, week, month or year. The chart will show the license usage. If there are more than one consumable license type, the graph will overlay each type to show the license usage.

Figure 20: NTE Consumable License Usage Over Time Widget