BC5 Analysing Language Data: Thinking like a Quantitative Linguist


Generalized additive mixed models (GAMMs) are an extension of the generalized linear mixed model that provides the analyst with a wide range of tools to model nonlinear functional dependencies in two or more dimensions (wiggly regression curves, wiggly regression surfaces and hypersurfaces).

GAMMs, which are implemented in the mgcv package for R by Simon Wood, provide a substantial and non-trivial addition to the toolkit of experimental psychology and experimental linguistics. Optimized smooths make it possible to discover and model nonlinear trends in time series data, ranging from the successive reaction times in simple behavioral experiments and pitch contours to the amplitude of the EEG in response to stimuli and tongue movements as measured by
means of electromagnetic articulography.

Furthermore, GAMMs make it possible to properly model nonlinear interactions between numerical predictors, allowing researchers to gain insight into, for instance, how fixation duration varies as a function of fixation position and lexical frequency.

One important extension with respect to the linear mixed model is the
possibility to relax the linearity assumption for random effects. In the context of the classic linear mixed-effects model, random intercepts combined with random slopes make it possible to calibrate regression lines to the levels of random effect factors (e.g., subjects). The factor smooths in GAMMs provide a non-linear extension, enabling the modeling of nonlinear "random" curves
instead of "random" straight lines.

The course will consist of two lectures and two lab sessions in which participants will receive guided instruction in analysing linguistic and experimental data including data from electromagnetic articulography, pupil dilation in response to reading, EEG data, reaction time data, and data from dialectometry.


Gaining experience with the statistical analysis of nonlinear data using GAMs as implemented in the mgcv package for R.


Baayen, R. H. (2013). Multivariate Statistics.  In R. Podesva and D. Sharma, Research Methods in Linguistics. Cambridge, Cambridge University Press, 337-372.


Baayen, R.H., Davidson, D.J. and Bates, D.M. (2008) Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59, 390-412.


Course location

Lecture Room 4

Course requirements

You should bring a laptop to the lab sessions.

Instructor information.