Crowdsourcing big data in English dialectology

Duration: 27 mins 14 secs

Share this media item:

Embed this media item:

<iframe width="_width_" height="_height_" src="https://sms.cam.ac.uk/media/2126409/embed" frameborder="0" scrolling="no" allowfullscreen></iframe>

Choose size:

About this item

Available Formats

About this item

Crowdsourcing big data in English dialectology's image

Description:	Linguistic patterns and trends identified as a result of the New York Times dialect quiz


Created:	2015-12-03 15:00
Collection:	Language Sciences Annual Symposium 2015 Cambridge Language Sciences
Publisher:	University of Cambridge
Copyright:	J.A. Walsh
Language:	eng (English)
Distribution:	World (downloadable)
Explicit content:	No
Aspect Ratio:	4:3
Screencast:	No
Bumper:	UCS Default
Trailer:	UCS Default


Abstract:	Dr Bert Vaux (Dept. of Theoretical and Applied Linguistics) The Harvard Dialect Survey of 2002-3 represented the first linguistic foray into large-scale crowdsourcing (60K respondents) incentivized by dynamic geospatial imaging. Working in tandem with statistics graduate student Josh Katz of North Carolina State University I expanded this in 2013 to make the New York Times dialect quiz, which deployed Josh's brilliant tweaks of existing clustering, visualization, and prediction algorithms to attract responses to my survey questions from more than 21 million humans. Since that time I have been collaborating with forensic linguist Jack Grieve of Aston University to extract linguistically-significant patterns and trends from our megacorpus. In this talk I report on the development of the New York Times quiz and some of the leading discoveries that have emerged from it, including isogloss conspiracies and stability, the role of political and commuting zones, and multivariate non-local cultural regions.

Abstract:

Dr Bert Vaux (Dept. of Theoretical and Applied Linguistics)

The Harvard Dialect Survey of 2002-3 represented the first linguistic foray into large-scale crowdsourcing (60K respondents) incentivized by dynamic geospatial imaging. Working in tandem with statistics graduate student Josh Katz of North Carolina State University I expanded this in 2013 to make the New York Times dialect quiz, which deployed Josh's brilliant tweaks of existing clustering, visualization, and prediction algorithms to attract responses to my survey questions from more than 21 million humans. Since that time I have been collaborating with forensic linguist Jack Grieve of Aston University to extract linguistically-significant patterns and trends from our megacorpus. In this talk I report on the development of the New York Times quiz and some of the leading discoveries that have emerged from it, including isogloss conspiracies and stability, the role of political and commuting zones, and multivariate non-local cultural regions.

Available Formats

Format	Quality	Bitrate	Size
MPEG-4 Video	640x360	1.91 Mbits/sec	391.23 MB	View	Download
WebM	640x360	910.34 kbits/sec	181.69 MB	View	Download
iPod Video	480x360	494.61 kbits/sec	98.66 MB	View	Download
MP3	44100 Hz	249.8 kbits/sec	49.89 MB	Listen	Download
Auto *	(Allows browser to choose a format it supports)

Streaming Media Service Upload

Crowdsourcing big data in English dialectology