Speech Processing for IP Networks

David Burke

E-Book (pdf)

(0)

Erste Bewertung abgeben

Media Resource Control Protocol (MRCP) is a new IETF protocol, providing a key enabling technology that eases the integration of ...

CHF92.00

Download steht sofort bereit

E-Book (pdf)

Beschreibung

Media Resource Control Protocol (MRCP) is a new IETF protocol,
providing a key enabling technology that eases the integration of
speech technologies into network equipment and accelerates their
adoption resulting in exciting and compelling interactive services
to be delivered over the telephone. MRCP leverages IP
telephony and Web technologies such as SIP, HTTP, and XML
(Extensible Markup Language) to deliver an open standard,
vendor-independent, and versatile interface to speech
engines.

Speech Processing for IP Networks brings these
technologies together into a single volume, giving the reader a
solid technical understanding of the principles of MRCP, how it
leverages other protocols and specifications for its operation, and
how it is applied in modern IP-based telecommunication
networks. Focusing on the MRCPv2 standard developed by the
IETF SpeechSC Working Group, this book will also provide an
overview of its precursor, MRCPv1.

Speech Processing for IP Networks:

Gives a complete background on the technologies required by
MRCP to function, including SIP (Session Initiation Protocol), RTP
(Real-time Transport Protocol), and HTTP (Hypertext Transfer
Protocol).
Covers relevant W3C data representation formats including
Speech Synthesis Markup Language (SSML), Speech Recognition Grammar
Specification (SRGS), Semantic Interpretation for Speech
Recognition (SISR), and Pronunciation Lexicon Specification
(PLS).
Describes VoiceXML - the leading approach for programming
cutting-edge speech applications and a key driver to the
development of many of MRCP's features.
Explains advanced topics such as VoiceXML and MRCP
interworking.

This text will be an invaluable resource for technical managers,
product managers, software developers, and technical marketing
professionals working for network equipment manufacturers, speech
engine vendors, and network operators. Advanced students on
computer science and engineering courses will also find this to be
a useful guide.

Autorentext

David Burke is Chief Technology Officer and co-founder of
Voxpilot Ltd, UK. David led Voxpilot to its current position
as a leader in VoiceXML interactive services platform technology.
His management duties at Voxpilot include executive management and
counsel, product vision, direction and management, responsibility
for all R&D activities including budgeting, engineering team
selection and mentoring, and architecture and design.

He is also member of the World Wide Web Consortium (W3C) Voice
Browser Working Group and of the Internet Engineering Task Force
(IETF) Speech SC Working Group.

Zusammenfassung
Media Resource Control Protocol (MRCP) is a new IETF protocol, providing a key enabling technology that eases the integration of speech technologies into network equipment and accelerates their adoption resulting in exciting and compelling interactive services to be delivered over the telephone. MRCP leverages IP telephony and Web technologies such as SIP, HTTP, and XML (Extensible Markup Language) to deliver an open standard, vendor-independent, and versatile interface to speech engines. Speech Processing for IP Networks brings these technologies together into a single volume, giving the reader a solid technical understanding of the principles of MRCP, how it leverages other protocols and specifications for its operation, and how it is applied in modern IP-based telecommunication networks. Focusing on the MRCPv2 standard developed by the IETF SpeechSC Working Group, this book will also provide an overview of its precursor, MRCPv1.

Speech Processing for IP Networks:

Gives a complete background on the technologies required by MRCP to function, including SIP (Session Initiation Protocol), RTP (Real-time Transport Protocol), and HTTP (Hypertext Transfer Protocol).
Covers relevant W3C data representation formats including Speech Synthesis Markup Language (SSML), Speech Recognition Grammar Specification (SRGS), Semantic Interpretation for Speech Recognition (SISR), and Pronunciation Lexicon Specification (PLS).
Describes VoiceXML - the leading approach for programming cutting-edge speech applications and a key driver to the development of many of MRCP's features.
Explains advanced topics such as VoiceXML and MRCP interworking.
This text will be an invaluable resource for technical managers, product managers, software developers, and technical marketing professionals working for network equipment manufacturers, speech engine vendors, and network operators. Advanced students on computer science and engineering courses will also find this to be a useful guide.

Inhalt
PART I. BACKGROUND. 1. Introduction. 1.1 Introduction to Speech Applications. 1.2 The MRCP Value Proposition. 1.3 History of MRCP Standardisation. 1.3.1 Internet Engineering Task Force. 1.3.2 World Wide Web Consortium. 1.3.3 MRCP: From Humble Beginnings Toward IETF Standard. 1.4 Summary. 2. Basic Principles of Speech Processing. 2.1 Human Speech Production. 2.1.1 Speech Sounds: Phonemics and Phonetics. 2.2 Speech Recognition. 2.2.1 Endpoint Detection. 2.2.2 Mel-Cepstrum. 2.2.3 Hidden Markov Models. 2.2.4 Language Modelling. 2.3 Speaker Verification and Identification. 2.3.1 Feature Extraction. 2.3.2 Statistical Modelling. 2.4 Speech Synthesis. 2.4.1 Front-end Processing. 2.4.2 Back-end Synthesis. 2.5 Summary. 3. Overview of MRCP. 3.1 Architecture. 3.2 Media Resource Types. 3.3 Network Scenarios. 3.3.1 VoiceXML IVR Service Node. 3.3.2 IP PBX with Voicemail. 3.3.3 Advanced Media Gateway. 3.4 Protocol Operation. 3.4.1 Establishing Communication Channels. 3.4.2 Controlling a Media Resource. 3.4.3 Walkthrough Examples. 3.5 Security. 3.6 Summary. PART II. MEDIA AND CONTROL SESSIONS. 4. Session Initiation Protocol. 4.1 Introduction. 4.2 Walkthrough Example. 4.3 SIP URIs. 4.4 Transport. 4.5 Media Negotiation. 4.5.1 Session Description Protocol. 4.5.2 Offer/Answer Model. 4.6 SIP Servers. 4.6.1 Registrars. 4.6.2 Proxy Servers. 4.6.3 Redirect Servers. 4.7 SIP Extensions. 4.7.1 Capability Discovery. 4.8 Security. 4.8.1 Transport and Network Layer Security. 4.8.2 Authentication. 4.8.3 S/MIME. 4.9 Summary. 5. Session Initiation in MRCP. 5.1 Introduction. 5.2 Initiating the Media Session. 5.3 Initiating the Control Session. 5.4 Session Initiation Examples. 5.4.1 Single Media Resource. 5.4.2 Adding and Removing Media Resources. 5.4.3 Distributed Media Source/Sink. 5.5 Locating Media Resource Servers. 5.5.1 Requesting Server Capabilities. 5.5.2 Media Resource Brokers. 5.6 Security. 5.7 Summary. 6. The Media Session. 6.1 Media Encoding. 6.1.1 Pulse Code Modulation (PCM). 6.1.2 Linear Predictive Coding (LPC). 6.2 Media Transport. 6.2.1 Real-Time Protocol (RTP). 6.2.2 DTMF. 6.3 Security. 6.4 Summary. 7. The Control Session. 7.1 Message Structure. 7.1.1 Request Message. 7.1.2 Response Message. 7.1.3 Event Message. 7.1.4 Message Bodies. 7.2 Generic Methods. 7.3 Generic Headers. 7.4 Security. 7.5 Summary. PART III. DATA REPRESENTATION FORMATS. 8. Speech Synthesis Markup Language (SSML). 8.1 Introduction. 8.2 Document Structure. 8.3 Recorded Audio. 8.4 Pronunciation. 8.4.1 Phonemic/Phonetic Content. 8.4.2 Substitution. 8.4.3 Interpreting Text . 8.5 Prosody. 8.5.1 Prosodic Boundaries. 8.5.2 Emphasis. 8.5.3 Speaking Voice. 8.5.4 Prosodic Control. 8.6 Markers . 8.7 Metadata. 8.8 Summary. 9. Speech Recognition Grammar Specification (SRGS). 9.1 Introduction. 9.2 Document Structure. 9.3 Rules, Tokens, and Sequences. 9.4 Alternatives. 9.5 Rule References. 9.5.1 Special Rules. 9.6 Repeats. 9.7 DTMF Grammars. 9.8 Semantic Interpretation. 9.8.1 Semantic Literals. 9.8.2 Semantic S…