Prosody Visualization Challenge

Posted by wagner.agnieszka on Fri, 27/04/2018 - 09:45

Speech visualizers – how well does your system capture speech style?


The central subject of our field is often a mystery to those outside it, and it is not always easy to describe prosodic variation – the phenomena variously called intonation, timing, voice quality, stress, prominence, juncture, vocal effort, and so on – to non-prosodists and to other researchers. We can play or perform examples. But there are good reasons, both practical and scientific, to describe prosody more abstractly, in ways that allow a visual summary of patterns. For one thing, it is a help in spotting differences across language varieties, usage contexts, and speech styles.

Prosody is generally not taught in school in either the first language or second languages, and in consequence, L2 learners often ignore useful information in prosody because they don’t even know it exists. Neurologists understand that some of their patients have prosodic symptoms, but their ability to characterize these symptoms – and even to recognize them in more subtle cases – is limited. Court cases sometimes hinge on the interpretation of how someone said something, but the ability of investigators, lawyers, judges, and juries to reason about such questions is more limited than it should be. And psycholinguists have speculated that infants’ acquisition of syntax may be fostered by “prosodic bootstrapping” – without clarity about exactly what “prosodic” means in this phrase.

Across the decades and centuries, there have been dozens of ideas about how to address these issues. In 1775, Joshua Steele published his method for “establishing the Melody and Measure of Speech to be expressed and perpetuated by Peculiar Symbols”. More recently, we have Dwight Bolinger’s typewritten pitch contours; Kingdon’s tonetic stress marks; graphs of f0 time functions; reduced-dimensional versions of f0 contours via local linear approximation or FPCA; notation systems such as ToBI, INTSINT, or RPT; and many other approaches.

This Babel of descriptive systems is hard for outsiders (and even insiders) to penetrate. What are the options? What are they like in practice? What are their similarities and differences, their successes and failures? The Prosody Visualization Challenge (PVC) invites all members of the foremost prosody community – Speech Prosody – to bring their systems of prosody visualization for a comparative display on a common set of recorded passages of natural speech. This will help everyone to know what systems are available, and how they compare.

Most existing systems for the presentation of prosody lend themselves best to contrasting alternative pronunciations of a given text, such as types and strengths of pitch accents and junctural phenomena, syntactic contrasts, or varying locations of prominence. Larger patterns of prosodic structure, such as those that listeners use in identifying speech style or variety, are often not directly addressed. Therefore the present PVC focuses on speech style.

Three sets of talkers are available: Poets, Preachers and Politicians. For each set, there are sample of two styles: Poets reading their poetry aloud, or talking about it; Preachers preaching, or talking to an interlocutor; and Politicians giving an address, or talking with an interviewer.
There will be sample sets in American English, and in Polish.

Analysis of the whole of the presented passages, easy for automatic systems, is optional. The challenge is to select a good pair of samples in which your visualization system best captures the style contrast, and explain in your poster how the contrast can be seen. An example poster is attached (though obviously you have room on a real poster for more figures and more text). The central point is: capture the speech style difference in your visualization.


There is no “gold standard” and no means of scoring entries. All systems will be on display during the entire conference. All SP2018 attendees will receive a response form on which they can rate and comment on the entries.

A set of recordings (for English and for Polish) will be available on the SP2018 website, and all entrants are expected to analyze (the designated phrases in) all of the samples (from one or both languages) using their chosen system. Other samples (also in other languages) may also be displayed, in which case the associated audio should also be uploaded to the website.

Each accepted entry will be presented via a illustrative poster, as well as a complete set of analyzed examples in digital form, uploaded to the relevant website.

There is no associated requirement for a paper in the proceedings, nor is there a novelty requirement for the presentation system. (In contrast: existing descriptions of the system used – in a journal article, conference proceedings, technical report, or website – will increase the likelihood of a Challenge submission being accepted.)

There is not even a requirement that the submissions be only from registered participants, although posters for whom a registered participant can answer questions may, for understandable reasons, end up being preferred by respondents. If you cannot attend, you can still send a poster (you can make the poster at home and send it with a colleague who is attending, or print it – preferably on fabric in this case – and send it by post, or mail us the pdf by the week before the conference and trust us to print it in Poznan; the latter option will cost €20).

And now the vital information: the audio samples are at

Contact persons:

Agnieszka Wagner:
Anne Cutler:
Grażyna Demenko: