Some great news for eLearning that may have been overlooked; is that the Internet has learned to speak text and with quite reasonable quality.  Since the finalisation of HTML5 in 2015 popular browsers and devices allow client side voice scripting to speak using a text to speech or link directly to online alternatives.  This works across all platforms on most popular devices including Android, iOS, Windows, Mac and Linux.  No longer do you have to immortalise text as audio recordings, and then require bulky downloads for users to hear it.  This meets needs for a range of user groups, from sight impaired to mobile listeners, and adds one more layer of fun and engagement to eLearning.

Because HTML5 speech involves some programming and voice expertise, ResponsiveVoice have created an easy to use library that links content to the browser’s in-built voice capabilities.

eLearning companies we have spoken with say they can save between 15-25% of production cost by using the new HTML5 Speech solution. In some cases where there is simply no budget for voice-overs courses can now include speech.

Today’s “talking browser” allows a range of new dynamic and exciting eLearning functionality that takes just minutes to enable in eLearning content or within any app or website.  This page is an active demonstration! Highlight any text to see how it works and read on to find out more about this small but groundbreaking advance in eLearning.


Prior to this breakthrough there were two alternatives.  The first was professionally recorded narration, which of course is much better and more realistic, however it costs much more money, involves difficult editing, needs a ‘one off’ recording and is more time consuming.  The second alternative is using a text to speech engine to create MP3 or WAV files, and then upload them to your server for learners to download as part of their courses, making changes means going through this whole process again. There are online services that turn text into audio on demand, but usually these are expensive and voices can range from high quality to robotic.

Instantly speaking text in the web browser using HTML5 wins out in all counts.    

Text to Speech

The browser can now speak with the aid of a simple script, the result will be a new age of advancement for the delivery of eLearning.  It needs no server connection and is as easy to edit as text.  Browsers can now speak any post, page, or selection which means that; courses can be enhanced to speak unlimited words with unlimited links.  It is possible to use 51 supported languages for browsers with over 150 voices and create more by varying the tone, speed and pitch.

It removes the need for the system administrator’s nightmare… ‘plugins’ or any other installations on the learners computer. There is no cost per word and no chunky audio files to download.  


Scripts to turn text to speech can be used in a number of ways.  

The customized eLearning plugin can voice enable Articulate StoryLine, Adobe Captivate, Lectora Inspire and almost any program that can call dynamic content. Automatically speaking slide contents or notes as the user progresses through the course.

If you are using WordPress a free plugin is available which adopts a set of WordPress short codes to give a simple user interface within the wordpress authoring interface.  

Site owners can simply use the ‘WebSite Agent’ (link) to voice enable their site in 3 minutes and use the dashboard to edit and use a range of customization features.  

Developers can work with the API using our library of advanced features.  

Ways to use text to speech in eLearning

There are a number of ways that text to speech can add value to eLearning, such as

  1. Reading out slide notes
  2. Reading the visible text content of slides
  3. Speaking hidden text that does not appear on the slides
  4. Speaking in reaction to events or triggers on a slide
  5. As part of a simulated conversation
  6. Accessibility, speaking navigation buttons and menus

Reading out slide notes

Augmenting the content of a slide with notes is a common feature of eLearning authoring tools. Now having those notes be read out loud by text to speech the learner can relax and choose the mode of learning best suited to them. If you’ve ever been thankful for movie subtitles when in a noisy environment you’ll know what we mean, the more complementary modes of communication the easier it is for the learner to absorb.

Reading the visible text content of slides

Sometimes what is written on the slide is exactly what you would like to speak to the learner. By reading the elements on the slide based on the order of the position they appear (left to right then top to bottom), the learner can have everything read to them and without the author needing to make any changes or write extra slide notes.

Speaking hidden text that does not appear on the slides

When creating engaging or interactive content, the author may only want a specific result, say perhaps speaking some instructions when the learner clicks on a button. The text doesn’t have to appear on the slide or in the notes, instead it can be invisible. You can see for an example of creating speaking buttons.

Speaking in reaction to events or triggers on a slide

If you are part-developer, you can use the API calls to speak() text on demand. This opens up unlimited possibilities for designing interactive and engaging eLearning projects.

As part of a simulated conversation

Most authors know how to embed YouTube videos into their eLearning projects. Using the same technique you can embed branching Chat Mapper scenarios complete with characters and text-to-speech.

Accessibility, speaking navigation buttons and menus

Web accessibility and compliance is now an essential part of delivering information and training online, for educational institutions to governments, NGO’s and equal opportunity employers. Adding voice to menus and navigations not only aids in overall user experience it can be the key to accessibility compliance guidelines for your organization.

Speech for eLearning

This is great news for everyone who is using the internet for learning purposes, because people simply absorb more of what they hear than what they read.  In a training situation a conversation can be simulated that would take place in, for example, a call-center, retail store, hospital ward or classroom.  The user can interact, first in audio and at later stages with visuals perhaps avatars in a 3D environment even; voice is the most significant first step.   It provides a key ingredient to building more fun and more engaging courses, enabling significant savings while accelerating workflow and more engagement in narrating, questioning, answering,playing games, giving lectures or telling stories.

Go top