BC5 Analysing Language Data: Thinking like a Quantitative Linguist


Generalized additive mixed models (GAMMs) are an extension of the generalized linear mixed model that provides the analyst with a wide range of tools to model nonlinear functional dependencies in two or more dimensions (wiggly regression curves, wiggly regression surfaces and hypersurfaces).

GAMMs, which are implemented in the mgcv package for R by Simon Wood, provide a substantial and non-trivial addition to the toolkit of experimental psychology and experimental linguistics. Optimized smooths make it possible to discover and model nonlinear trends in time series data, ranging from the successive reaction times in simple behavioral experiments and pitch contours to the amplitude of the EEG in response to stimuli and tongue movements as measured by
means of electromagnetic articulography.

Furthermore, GAMMs make it possible to properly model nonlinear interactions between numerical predictors, allowing researchers to gain insight into, for instance, how fixation duration varies as a function of fixation position and lexical frequency.

One important extension with respect to the linear mixed model is the
possibility to relax the linearity assumption for random effects. In the context of the classic linear mixed-effects model, random intercepts combined with random slopes make it possible to calibrate regression lines to the levels of random effect factors (e.g., subjects). The factor smooths in GAMMs provide a non-linear extension, enabling the modeling of nonlinear "random" curves
instead of "random" straight lines.

The course will consist of two lectures and two lab sessions in which participants will receive guided instruction in analysing linguistic and experimental data including data from electromagnetic articulography, pupil dilation in response to reading, EEG data, reaction time data, and data from dialectometry.


Gaining experience with the statistical analysis of nonlinear data using GAMs as implemented in the mgcv package for R.


Baayen, R. H. (2013). Multivariate Statistics.  In R. Podesva and D. Sharma, Research Methods in Linguistics. Cambridge, Cambridge University Press, 337-372.


Baayen, R.H., Davidson, D.J. and Bates, D.M. (2008) Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59, 390-412.


Course location

Lecture Room 4

Course requirements

You should bring a laptop to the lab sessions.

Instructor information.

Instructor's name

Harald Baayen


cf. website


Harald Baayen studied general linguistics with Geert Booij in Amsterdam, and obtained his PhD degree in 1989 with a quantitative study on morphological productivity. From 1990 to 1998 he was a researcher at the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands. In 1998, upon receiving a career advancement award from the Dutch Science Foundation, he became associate professor at the Radboud University in Nijmegen, thanks to
a Muller chair supported by the Dutch Academy of Sciences.  In 2007 he took up a full professorship at the University of Alberta, Edmonton, Canada. In 2011 he received an Alexander von Humboldt research award from Germany, which brought him to the University of Tübingen, where he is now heading a large research group investigating the role of learning in lexical representation and processing. Harald Baayen has published widely in international journals, including Psychological Review, Language, Journal of Memory and Language, Cognition, PLoS ONE, and the Journal of the Acoustical Society of America. He published a monograph on word frequency distributions with Kluwer, and an introductory textbook on statistical analysis (with R) for the language sciences with Cambridge University Press.