• Building a Text to Speech App Using AVSpeechSynthesizer

    25 February 2015

    iOS is an operating system with many possibilities, allowing to create from really simple to super-advanced applications. There are times where applications have to be multi-featured, providing elegant solutions that exceed the limits of the common places, and lead to a superb user experience. Also, there are numerous technologies one could exploit, and in this tutorial we are going to focus on one of them, which is no other than the Text to Speech.

    Text-to-speech (TTS) is not something new in iOS 8. Since iOS 7 dealing with TTS has been really easy, as the code required to make an app speak is straightforward and easy to be handled. To make things more precise, iOS 7 introduced a new class named AVSpeechSynthesizer, and as you understand from its prefix it’s part of the powerful AVFoundation framework. TheAVSpeechSynthesizer class, along with some other classes, can produce speech based on a given text (or multiple pieces of text), and provides the possibility to configure various properties regarding it.


    The AVSpeechSynthesizer is the responsible class for carrying out the heavy work of converting text to speech. It’s capable of initiating, pausing, stopping and continuing a speech process. However, it doesn’t interact directly with the text. There’s an intermediate class that does that job, and is calledAVSpeechUtterance. An object of this class represents a piece of text that should be spoken, and to put it really simply, an utterance is actually the text that’s about to be spoken, enriched with some properties regarding the final output. The most important of those properties that theAVSpeechUtterance class handles (besides the text) are the speech ratepitch and volume. There are a few more, but we’ll see them in a while. Also, an utterance object defines the voice that will be used for speaking. A voice is an object of the AVSpeechSynthesisVoice class. It always matches to a specific language, and up to now Apple supports 37 different voices, meaning voices for 37 different locales 


Comments closed on this post.