Each chapter of this book will involve the analysis of data from a single perceptual experiment carried out by a group of 15 listeners. In this chapter, we will discuss the data for our experiment, introduce concepts related to variables and probabilities, and provide a very basic introduction to R along the way. As noted in the preface, a working knowledge of R is assumed and a familiarity with basic statistics is probably helpful, though not strictly necessary. The preface also provides suggested readings for those wanting to do some background reading on R or statistical inference and provides information about the software you need installed to follow along with the examples in the book.
1.1 Chapter pre-cap
In this chapter, we discuss the use of experiments for scientific inference, in addition to some inherent problems associated with inferences based on limited amounts of observations. After this, we describe the perceptual experiment that will be analyzed in this book, including a discussion of its structure, aims, procedures, and resulting data. Following the introduction of the experiment, we discuss variables and the way that we use these in experiments to make inferences. After that, we introduce some R data types, and the relationship between these and different kinds of variables is presented. Finally, we present some ways to visualize different sorts of variables and the relationships between them.
1.2 Experiments and effects
An experiment is a procedure or process that can help answer some research questions. Obviously, when defined so broadly, almost anything can be an experiment. In fact, when a child touches a hot stove to see what ‘red’ feels like, they are conducting an experiment which provides essential information about their world. In an academic context, experiments are expected to be scientific. However, there is no definition of scientific that is not socially and historically contingent. What is considered ‘scientific’ is determined by what scientists in a specific time and place consider to be scientific, and this can change and has changed, substantially over time.
At the moment, in most contexts, a research project is ‘scientific’ when it generally conforms to the scientific method. Of course, just as with science and scientific, there is no single scientific method, and no single ‘true’ definition that can be referred to. Instead, the scientific method consists generally of a process in which researchers: (1) Ask questions based on gaps in their knowledge about the world, (2) Collect data using codified procedures developed to avoid certain pitfalls and maximize the chance that the collected data can answer their questions, (3) Evaluate their questions in light of their data, and (4) Reach conclusions where possible, and synthesize their conclusions with their previous knowledge about the world.
Modern ‘scientific’ work usually involves the collection of empirical measurements, the quantification of patterns in these measurements, and the qualitative description of the quantitative patterns in the measurements. As a result, much modern scientific work yields large quantities of numeric values, observed under different conditions, which the researcher must then (statistically) analyze in order to understand. For example, imagine an experiment about whether caffeine makes people read faster. Subjects are asked to drink either a cup of coffee or a cup of decaf. After a 30-minute wait, they are asked to read a passage aloud and the duration of the reading is measured. Basically, we are measuring two different values, “the amount of time it takes people to read a passage of text after drinking decaf” and “the amount of time it takes people to read a passage of text after drinking regular coffee”.
The experiment outlined above allows us to ask: is “the amount of time it takes people to read a passage of text after drinking decaf” the same as “the amount of time it takes people to read a passage of text after drinking regular coffee”? Another way to look at this is that we are interested in the effect of caffeine on reading times. By ‘effect’ we mean the degree to which caffeine is associated with changes in the characteristics of our observation (reading times) in some way. For example, if the average reading times were the same in both groups, we would conclude that “caffeine has no effect on reading times”. In contrast, if reading times were 800 milliseconds shorter in the caffeine group, we might conclude “caffeine has the effect of reducing reading times by 800 milliseconds”.
The relationship between statistical effects and causality is tricky. In some cases, like with caffeine and reading times, it seems reasonable that the caffeine is actually causing faster reading times since caffeine is associated with increased energy. However, we want to be sure to not imply that effects are necessarily causal. For example, decaf might also speed up reading times (relative to cold water) due to a placebo effect. This would suggest that some of the increase in reading times with regular coffee is due to people’s expectations regarding its effect, in addition to the actual effect of caffeine. So, increased reading times after drinking regular coffee allows us to establish an association between these observations but does not in any way prove that one is causing the other. So, our use of the term effect should be interpreted as meaning ‘association’ rather than ‘cause’.
Our experiment on reading times is specifically constructed to investigate the effect of caffeine on reading times. If the speakers in our experiment were randomly assigned to conditions, there is no particular reason to expect that their reading times would be different in the absence of caffeine. So, if we find that people read faster in the caffeine group, we may establish an association between caffeine and an increase in speaking rate. Combined with our knowledge of the physiological effect of caffeine on human listeners, we may conclude that this association suggests that caffeine consumption causes a difference in reading times. However, it’s important to keep in mind that it is our world knowledge that allows us to make a causal inference and not the statistical association on its own. If we found a statistical association between wearing a green t-shirt and reading times, we should be more hesitant to make claims about a causal relationship.
This same logic applies in situations where we do not randomly assign subjects to groups, as long as we are careful in creating equivalent groups. Consider the same experiment about speaking rate carried out with groups based on speaker gender rather than drinking coffee. In this case, the question would be “is the amount of time it takes men to read this passage of text the same amount of time that it takes women to read this passage of text”. If the speakers are generally similar in important characteristics (e.g. dialect, cultural background, and education levels) apart from gender, then any group differences may be attributable to the effect of gender on speaking rate (although establishing causal relationships is a difficult thing, as noted above, and involves more than simply randomization).
What we are describing in the above paragraphs are controlled experiments, experiments where the researcher takes an active role in ensuring the ‘fairness’ of the experiment. The notions of control and fairness are somewhat hazy and are perhaps more gradient than discrete (i.e. ‘controlled’ vs. ‘uncontrolled’). However, some situations clearly do not lead to ‘fair’ outcomes. For example, what if the caffeine group of readers were all first-language English speakers, and the decaf group had a substantial number of second-language speakers with little experience reading in English. The caffeine group may very well read faster simply because they are more polished readers, independent of any effects of caffeine. Whenever possible, researchers avoid situations like this by exerting control over their experiments, both in the structure of their experiments and in the recruiting and assignment of their participants to experimental conditions.
Due to random between-speaker variation (among other things), there is no chance whatsoever that average reading times across both groups will be exactly identical, even if caffeine has no effect on reading times at all. In addition, if you re-ran the experiment with the same people and sentences, there is no chance that both group means would be the same across experiments. Basically, any two given group means are always expected to differ due to chance alone. And yet, there is the possibility that caffeinated reading times are systematically different – i.e. different in a way that the random variation of groups across replications is not. So, how can we ever establish that our measures are actually different and don’t just appear to be different because of randomness? It is precisely this problem that has motivated scientists to use statistical analyses to help answer their research questions.
1.2.1 Experiments and inference
This book is about statistical inference. We will talk about the ‘statistics’ part in more detail in the next chapter, but we can talk about the ‘inference’ part now. Inference is a form of reasoning that allows you to go from a set of observations or premises to a conclusion about some facts. For example, you may arrive at a newly discovered island and see white cats wandering around. If you are there for a while and continue to observe only white cats, you may conclude “all the cats on this island are white”. If you do this you have made what is called an inductive inference: You have gone from a set of observations (the cats you saw) to a general conclusion about all the cats on the island.
Often, experiments are not just about observing and measuring certain effects, but also about drawing inferences regarding those effects. For example, in the reading time experiment described above, the researchers are not specifically interested in the reading times of the people in the experiments (i.e. the cats they saw) but rather in the reading times of people more generally (i.e. all the cats on the island).
Since inductive inference seeks to go from limited observation to general rules or principles, it has a central weakness. For example, your inference that only white cats exist on the island is on solid ground until you see a cat that is not white. Can you be sure this won’t happen? You can’t, because you don’t know what you don’t know and you can’t be sure that what hasn’t happened yet will never happen. This is called the problem of induction and it is a fundamental weakness of inductive reasoning.
Another problem faced when using experiments to gather knowledge is the fallacy of affirming the consequent, also known as the fallacy of the converse. Affirming the consequent arises when a researcher works backward from their “then” statement to their “if” statement. For example consider the statement: “If caffeine speeds up reading, then reading times will be faster for the caffeine group”. Even if it’s true that the reading times are faster for the caffeine group, it is not necessarily true that it is caused by the caffeine.
The problem is that the caffeine group reading faster is a necessary but not sufficient condition for the conclusion that ‘caffeine speeds up reading’. Affirming the consequent does not mean that if/then statements are not useful, but simply that they cannot be used to prove the truth of the premises in a logically necessary manner. For example, consider the statement: “If am the king of England, this coin flip will be heads”. This silly example shows that the truth of the second part of the statement does not prove the first.
We actually do think it would be reasonable to conclude that caffeine is causing the increase in reading times based on the experiment outlined above, given enough participants. However, it’s useful to be aware of the fundamental limitations of trying to understand general patterns given limited sets of observations. It’s also useful to think about how we can reason in a way that might minimize the odds of inferential mistakes, especially by including our general knowledge of the world (and the specific topic) in our reasoning. For example, rather than observing white cats and leaving it at that, we can ask: Why are the cats white? Do evolutionary pressures cause them to be white? How do their genetics ensure that all members of the species will be white? Is there any chance non-white cats could enter the population? Considering the answers to questions like this, in combination with our observations, can make inferences like “all cats on this island are white” more reliable.
The examples above involved the effect of caffeine on reading times. We are interested in generalizing to the human population based on what is a tiny sample of humans, relatively speaking. If we make the claim “caffeine speeds up reading times”, are we extending that to all humans, or at least to all English speakers? Past, present, and future? That is a bold claim based on a small number of data points, or it would be in the total absence of any world knowledge and prior expectations. Of course, we know that caffeine is a stimulant and seems reasonably likely to make people read faster. The finding fits within our larger worldview and, as a result, we may accept it as likely to be ‘true’. In contrast, suppose that the two groups had instead drank plain water, one ‘regular’ and one dyed with blue food coloring. In this situation, we may be skeptical suppose that there is an effect. Since this finding does not conform to any prior knowledge about the world, it is the sort of inference that may turn out to be less reliable, in the long run.
1.3 Our experiment
As noted above, each chapter in this book will feature the analysis of data from the same perceptual experiment. In this section, we provide an overview of the experiment, its design, the general research questions it can address, and an overview of the resulting data. A more thorough explanation of the issues at hand and the design of the experiment can be found in Chapter 13.
1.3.1 Our experiment: Introduction
Any two speakers will likely ‘sound’ different from each other even when they are saying the ‘same’ word. These between-speaker differences can, in some cases, be systematically associated with speaker characteristics such as age, height, and gender. So, tall speakers tend to sound one way, while shorter speakers tend to sound another way. As a result, although it may sound odd to talk about how tall someone sounds, listeners are able to use the acoustic information in a speaker’s voice to guess the speaker’s age, gender, size, and so on. This information is referred to as the speaker’s indexical characteristics: Social and physical information regarding the speaker that is understood from the way someone speaks.
We can ask two different questions with respect to assessments of indexical characteristics from speech: (1) Are they accurate, and (2) How do listeners arrive at their guesses? Generally speaking, listeners are often not very accurate in their judgments of indexical characteristics, however, they are consistent in the errors that they tend to make. For example, if one voice is incorrectly assumed to belong to a particular sort of speaker, it will often be the case that this mistake is a regular occurrence.
The ‘guessing’ of speaker characteristics is dominated by two acoustic cues: Voice pitch and voice resonance. Voice pitch can be thought of as the ‘note’ someone produces with their speech. When you sing you produce different notes by producing different pitches. The pitch of a sound is related to the vibration rate of the thing that produced the sound because repetitive vibration produces a repetitive sound wave that humans perceive as a pitch. Human voice pitch is regulated by changing the vibration rate of the vocal folds in your larynx. You can feel this vibration if you hum a song and press your fingers against the middle of the front of your neck. Pitch is an auditory sensation, a feeling you have in relation to a periodic acoustic event, a sound. When you hear two sounds, if you can order them based on which sound is ‘lower/higher’ than the other, then they differ in pitch. Since this quality cannot be directly measured, scientists measure the fundamental frequency (f0) of the sound to quantify its pitch. The f0 of a sound is measured in Hertz (Hz), which measures the number of times a sound wave repeats itself in a second.
Smaller things tend to vibrate at higher rates than larger things. This holds for vocal folds as well; shorter vocal folds tend to vibrate at higher rates than longer vocal folds, thereby producing speech with a higher pitch. As a result, larger speakers tend to produce speech with a lower pitch than smaller speakers. Since the vocal folds grow as one grows into adulthood, voice pitch is a good indicator of age between young childhood and adulthood, but is less useful to distinguish adults of different ages. Basically, pitch may help you distinguish a 5-year-old from an 18-year-old, but not an 18-year-old from a 30-year-old.
In addition to general age-related changes, the vocal folds tend to increase in size during puberty for most males so that post-pubescent males tend to produce speech with a lower pitch than the rest of the human population. As a result of these relations, a voice with a lower voice pitch is more likely to be produced by someone who is older, taller, and more likely to be male than a voice with a higher pitch. The relationships between age, height, gender, and f0 are presented in Figure 1.1. The height information used throughout this book is available in the height_data data included in the bmmb package and is from Fryar et al. (2012).
Resonance can be thought of as the ‘size’ of a sound. For example, a violin and a cello can be playing the same note (with the same pitch), but the cello will ‘sound’ bigger. This is because lower frequencies resonate in its larger structure. In the same way, speakers with longer vocal tracts (the space from the vocal folds to the lips) tend to ‘sound’ bigger by producing speech with lower frequencies overall. We don’t really have good words to describe what resonance ‘sounds’ like, but a small resonance (short vocal tract) sounds ‘heliumy’. When a person breathes helium and speaks, their pitch does not go up, but their resonance frequencies increase. Acoustically and perceptually, this mimics the effects of having a very short vocal tract (for more information on this, see Chapter 13.
Long vocal tracts sound like slow-motion speech (think of someone saying “noooooooooooooooooo….” when something bad is happening in slow motion in a movie), and this is because slowing down the playback of a recording simulates a lowering of resonance frequencies in speech (along with the pitch). In fact, size simulation by resonance manipulation is how the recordings for ‘Alvin and the chipmunks’ were originally created. A low-pitched male singer was recorded singing abnormally slow, and the recording was sped up in order to simulate speech with a very high resonance (and an associated very short vocal tract). If you wonder what an increased resonance sounds like, there is a gas called ‘sulfur hexafluoride’ that mimics this effect (because it is very dense). Examples of people increasing their voice resonance by inhaling this gas can be found on YouTube.
There are many ways to measure the resonance of a voice. In our data, we will use speech acoustics to directly estimate the length of the vocal tract that produced it, in centimeters (in the manner described in Chapter 13. So, our measure of voice resonance will not be acoustic at all but will instead measure the physical correlate of the vocal tract expected to have produced the speech sound.
In general, a lower voice resonance suggests a longer vocal tract length in centimeters. There is a strong positive relationship between vocal tract length and body length (i.e. height) across the entire human population. This means that if a person is taller than another, their vocal tract is expected to be longer and their voice resonance is expected to be lower. Since height increases from birth into adulthood, this means that voice resonance can be used to predict both height and age. In addition, adult males tend to be somewhat taller than adult females in most populations, with an average difference of about 15 cm in the United States. As a result, voice resonance can be used to infer the gender of adult speakers, and possibly that of children as well. These relationships are shown in Figure 1.1.
In summary, voice pitch and voice resonance are independent ways that someone can acoustically ‘sound’ bigger/smaller, older/younger, and male/female. The experiment to be described below is a perceptual experiment involving behavioral measures, meaning we observed people’s behavior in reaction to the stimuli. In this experiment, human listeners listened to auditory stimuli (words) and were asked to answer questions regarding what they heard. The experiment was designed to investigate the way that speech acoustics are used by listeners to determine the age, gender, and height of speakers, and the way that these decisions affect each other.
1.3.2 Our experimental methods
Our listeners were 15 speakers of American English. Listeners were presented with the word “heed” produced by 139 different speakers of Michigan English. These speech samples were recorded by Hillenbrand et al. (1995) and are available on the GitHub page associated with this book. So, this experiment featured 139 unique stimulus sounds that the listeners in the experiment were asked to respond to. The stimuli used were produced by 48 adult females, 45 adult males, 19 girls (10–12 years of age), and 27 boys (10–12 years of age). These speakers showed substantial variation in their voice pitch and resonance as measured by their f0 and estimated vocal tract length (as will be discussed in Section 1.5).
In addition to the natural acoustic variation that exists between speakers, voice resonance was also manipulated experimentally. All stimuli were manipulated by shifting the spectral envelope down by 8%, simulating an increase in speaker size of approximately 8% (all other things being equal). The downward shift of the spectral envelope results in a perceptually lower resonance which should ‘sound bigger’ to listeners. This acoustic manipulation is similar to the one carried out to make voices such as those of ‘Alvin and the Chipmunks’ sound small, but in reverse (and the pitch was not affected). Each listener responded to all 278 stimuli (139 speakers × 2 resonance levels), for a total of 4,170 observations across all listeners (15 listeners × 278 stimuli). Stimuli were presented one at a time, randomized along all stimulus dimensions. This means that tokens were thrown in one big pile and selected at random in a way that the properties
of an upcoming stimulus were never predictable based on the previous one. For each trial, listeners were presented with a single word at random and were asked to:
a) Indicate whether they thought the speaker was a “boy 10–12 years old”, a “girl 10–12 years old”, a “man 18+ years old”, or a “woman 18+ years old”. This is the apparent speaker category.
b) Estimate the height of the speaker in feet and inches (converted to centimeters for this book). This is the apparent speaker height.
Our intention is to analyze the apparent speaker category and height judgments provided by listeners in order to address specific research questions (discussed in the following section). To do this, we will use acoustic descriptions of the different speakers’ voices, focusing on the fundamental frequency of their speech, and the vocal tract length implied by their speech (estimated as described in Chapter 13. In addition, we will investigate how listeners’ judgments about the speaker’s age, gender, and height can affect each other and the use of speech acoustics.
1.3.3 Our research questions
This experiment is meant to investigate how listeners use speech acoustics to estimate the height of unknown talkers. The results will also let us investigate the relationship between the perception of talker size and the perception of talker category. Specific research questions will be discussed in each chapter, however, a general overview will be provided here. The expectations to be outlined below are based on the empirical relationships between these measurements and characteristics outlined above, shown in Figure 1.1 and on previous research (discussed in Chapter 13).
The assumption is that listeners are familiar with the relationships between apparent speaker characteristics and speech acoustics, and ‘somehow’ use the information in speech to guess speaker characteristics. For example, if we know that a speaker with an f0 of 100 Hz is usually an adult male and is usually about 176 cm tall, we might expect listeners will identify speech stimuli with an f0 near 100 Hz as produced by an adult male speaker who is about 176 cm tall.
Listeners were asked to provide two responses, speaker height and speaker group. The four-speaker groups can be split according to two characteristics: The age of the group and the gender of the group (boy = male child, girl = female child, man = male adult, woman = female adult). So, we can consider that listeners reported the height, age, and gender of the speaker, for each sound they listened to.
Lower frequencies, whether f0 or resonances are expected to be associated with taller and older speakers. For post-pubescent speakers, low frequencies, particularly in f0, can also be an indicator of maleness. In general, we expect that the perception of maleness will be associated with the perception of taller speakers, in particular for older speakers. The perception of adultness should be associated with taller speakers for either gender, however, the difference in height between boys and men tends to be larger than that between girls and women.
Finally, it’s also possible that the acoustic information in voices is used differently based on the apparent category of the speaker. For example, maybe listeners use f0 one way when they think the speaker is an adult and another way when they think the speaker is a child. In addition, it’s possible that different listeners use acoustic information in idiosyncratic ways that are systematic within-listener, but which differ arbitrarily from each other between listeners.
1.3.4 Our experimental data
The data associated with this experiment is available in the bmmb package (discussed in the preface), and can be accessed using the code below:
The code above loads our data and places it into our workspace in an object called exp_data_all. Below we use the head function to see the first six lines of the data for the experiment. Our data is in long format, so each row is a different individual observation and each column is a different piece of information regarding that observation. Each individual trial (a single row) represents an individual listener’s response to a single stimulus word played to them. So, we know that this data frame has 4,170 rows to represent the 4,170 observations in our data.
# see first 6 rowshead (exp_data_all)## L C height R S C_v vtl f0 dur G A G_v A_v## 1 1 g 165.6 a 1 b 12.2 277 237 f c m c## 2 1 w 173.2 b 1 b 12.2 277 237 f a m c## 3 1 w 165.6 a 2 b 12.4 287 317 f a m c## 4 1 g 147.8 b 2 b 12.4 287 317 f c m c## 5 1 g 165.6 a 3 b 11.6 219 277 f c m c## 6 1 g 158.8 b 3 b 11.6 219 277 f c m c
If this were data that you collected and wanted to analyze, you would likely have it somewhere on your hard drive in a CSV file, or some equivalent data file. If you were to open this data in Excel (or a similar software), you would see your data arranged in rows and columns. Below we write our data out as a CSV file so that we can have a look at it outside of R.
We can get more information about our data using the str function, which tells us that our data is stored in a dataframe. A data frame is a collection of vectors that can be of different types, but which must be of the same length. A vector is a collection of elements of the same kind. Below, we see that the str function tells us about the vectors comprised by our data frame.
We see four kinds of vectors in our data: int indicating that the vector contains integers, num indicating that the vector contains floating-point numbers (i.e. numbers that can have decimal points), chr indicating that the vector contains elements made up of characters (i.e. letters or words), or numbers being treated as if they were letters (i.e. as symbols with no numeric value), and Factor which indicates that the vector contains categorical predictors (discussed below). The information represented in each column of our data frame is:
• L: An integer from 1 to 15 indicating which listener responded to the trial. • C: A letter representing the apparent speaker category (b = boy, g = girl, m = man, w = woman) reported by the listener for each trial. • height: A floating-point number representing the height (in centimeters) reported for the speaker on each trial. • R: A letter representing the resonance scaling for the stimulus on each trial. The coding is a (actual) for the unmodified resonance and b (big) for the modified resonance (intended to sound bigger). • S: An integer from 1 to 139 indicating which speaker produced the trial stimulus. • C_v: A letter representing the veridical (actual) speaker category (b = boy, g = girl, m = man, w = woman) for each trial. • vtl: An estimate of the speaker’s vocal tract length in centimeters. • f0: The vowel fundamental frequency (f0) measured in Hertz. • dur: The duration of the vowel sound, in milliseconds. • G: The apparent gender of the speaker indicated by the listener, f (female) or m (male). • A: The apparent age of the speaker indicated by the listener, a (adult) or c (child). • G_v: The veridical gender of the speaker indicated by the listener, f (female) or m (male). • A_v: The veridical age of the speaker indicated by the listener, a (adult) or c (child).
Since more than one person who read earlier versions of this book complained about the short variable names, we want to explain why we used names like A and not apparent_age, and L instead of Listener. When we get to more complicated models, A:G:f0:L is a manageable amount of characters for a variable name, while apparent_age:apparent_gender:f0:Listener is not. The latter may be nicer in the short term but becomes impossible to deal with in a compact manner in plots and on the page. The only way to feature compact descriptions and printouts for the more complicated models in the second half of this book was to go with the short variable names from the start. By using the short names, we can have consistent naming in the text, in the formal mathematical descriptions of our models, and in the information printed in the R console. Since the same data is used in every chapter and the same handful of variables are used in every chapter, we hope that this decision will not be too vexing for the reader.
We can access the individual vectors that make up our data frame in many ways. One way is to add a $ after the name of our data frame, and then write the name of the vector after. This is shown below for our vector of heights.
exp_data_all$height
Running the command above will write out the entire vector on your screen, which includes all 4,170 observations of height responses that make up our data. Using the head function will show you the first six elements of an object, and you can get specific elements of the vector using brackets as shown below.
# show the first sixhead(exp_data_all$height)## [1] 165.6 173.2 165.6 147.8 165.6 158.8# show the first elementexp_data_all$height[1]## [1] 165.6# show elements 2 to 6exp_data_all$height[2:6]## [1] 173.2 165.6 147.8 165.6 158.8
Below, we use two sets of brackets to retrieve the height vector using its position in the data frame (first example) or its name (second example).
We can also retrieve the height vector by using a single set of parentheses as shown below. This method relies on treating the data frame as a matrix whose elements are arranged on a grid. Each element of the grid can then be accessed by providing x and y grid coordinates in single brackets as in [x,y]. Below we retrieve the entire third column by specifying a column number (or name) but leaving the row number unspecified.
We use the same method to recover the entire first row of the data frame, and then the second element of the first row (or, from another perspective, the first element of the second column).
exp_data_all[1,]## L C height R S C_v vtl f0 dur G A G_v A_v## 1 1 g 165.6 a 1 b 12.2 277 237 f c m cexp_data_all[1,2]## [1] "g"
1.4 Variables
Each of the columns in the exp_data_all data frame can be thought of as a different variable. Variables are placeholders for some value, whether we know it or not. For example, you can say “my weight is \(x\) pounds” or “this data represents a response provided by experimental subject \(x\)”. In our data, our variables take on different values from trial to trial, and the values of these variables tell us about the different outcomes and conditions associated with the trial. In this section, we’re going to discuss different aspects of variables, especially as they pertain to the analysis of experimental data.
1.4.1 Populations and samples
Anything that varies from observation to observation in an unpredictable manner can be thought of as a random variable. For example, your exact weight varies from day to day around your ‘average’ weight. In principle, you could probably explain exactly why your weight varies if you were so inclined. However, in practice, you are probably not exactly sure why your weight is a bit higher one day and a bit lower the next. So, your weight is a random variable not necessarily because it is impossible to know why it varies, but simply because you don’t currently have the means to predict its exact value for any given observation.
In order to answer questions about reasonable values for variables of interest, scientists often collect measurements of that variable. These measurements can help us understand the most probable values of this variable, and the expected range of the variable, even if its value for any given observation is unpredictable. For example, although you may not know your exact weight on any given day, if you weigh yourself with some regularity you may have enough observations to have a pretty good idea of what your weight might be tomorrow. In addition, your expectation may be so strong that a large deviation from it would be more likely to result in your buying a new scale than believing the measurement.
A sample is a finite set of observations – measurements of a variable – that you actually have. The population is the entire, possibly infinite, set of possible outcomes associated with the random variable. For example, the population of “f0 produced by adult women in the United States” contains all possible values of f0 produced by the entire set of women from the United States. Our sample is the specific set of observations we have from the speakers we observed.
Usually, a scientist will collect a sample to make inferences about the population. In other words, we are interested in the general behavior of the variable itself, not just in the small number of instances that we observed. For example, Hillenbrand et al. collected their data to make inferences about speakers of American (or Michigan) English in general, and not because they were particularly interested in the speakers in their sample. Similarly, we are not specifically interested in the opinions of the 15 listeners in our data, but in what their behavior might tell us about the population of human listeners in general.
1.4.2 Dependent and Independent Variables
We can make a very basic distinction between variables that we want to explain or understand, and variables that we use to explain and understand. The variables we want to explain are our dependent variables, and they are usually the variables we measure or observe in an experiment. The variables that we use to explain and understand our measurements are our independent variables (sometimes called explanatory variables).
Dependent variables can often be random, which means their values are not knowable a priori (before observation). For example, you may have some expectations about what your weight might be before you get on a scale, but in general, you can’t know exactly what it will say with certainty before collecting the observation. Although the exact values of our dependent variables can vary somewhat unpredictably from trial to trial, in the context of an experiment, there is the general expectation that these values will depend in some way on the other variables in the experiment. For example, in this book we will analyze experimental data. In this experiment, we modified the resonance frequencies of stimuli so that some are expected to ‘sound’ bigger than others. As a result, we expect an association between values of apparent height and the R (Resonance) variable in our data. In other words, the value of apparent height depends on the value of R.
Variables that help predict the response (dependent) variable are sometimes referred to as independent variables because their values are not considered to depend on those of the other predictors. More specifically, we can say the values of our independent variables are not assumed to depend on the values of the other variables within the context of our experiment, or in a manner that directly relates to the relevant research questions.
Our experiment has two response variables: the apparent height (height) reported for each trial, and the apparent speaker category (C) reported for each trial. Our experiment also involves several variables that could be used to understand our responses (i.e. every other variable in the data). Whether a variable is dependent or independent depends on the research question and on the structure of the model more than on some inherent property of variables and data. For example, the data in exp_data_all could be used to understand variation in voice pitch (f0) across speaker groups. In this case, f0 would be the dependent variable and the veridical speaker category (C_v) would be the independent variable. Another researcher may choose to model how perceived height varies as a function of f0 and speaker group. In this case, height would be the dependent variable, and f0 and C_v would be the independent variables.
1.4.3 Categorical variables and ‘factors’ {sec-c1-categorical}
Categorical variables, also sometimes called nominal, are variables that take on some set of non-numeric, usually character values. Often, categorical variables are the labels that we apply to objects or groups of objects. For example, gender is a categorical variable with possible values of ‘male’ and ‘female’, among others. In our experiment data, C, S, L, R, C_v, A, G, A_v, and G_v are categorical variables. Categorical predictors are often called factors. Factors can take on a limited number of values, called levels. For example, if your factor is ‘word category’ your factor levels may be ‘verb’ and ‘noun’. If your factor is ‘first language’, your levels may be ‘Mandarin’, ‘English, ‘Italian’, and ‘Hindi’.
A factor is a data type in R. A vector of factors is very similar to a vector of words, but it has some additional properties that are useful. For example, consider our C_v predictor, which tells us which category each speaker falls into. When C_v is treated as a vector of factors, rather than a character vector, our nominal labels will have associated numerical values. Many R functions turn nominal (non-numeric) predictors into factors and doing this yourself gives you control over how this will be handled.
# see the first 6 observationshead (exp_data_all$C_v) ## [1] b b b b b b## Levels: b g m w# it has levelslevels(exp_data_all$C_v) ## [1] "b" "g" "m" "w"# each level has numerical valuestable (exp_data_all$C_v, as.numeric (exp_data_all$C_v)) ## ## 1 2 3 4## b 810 0 0 0## g 0 570 0 0## m 0 0 1350 0## w 0 0 0 1440
By default, factor levels are ordered alphabetically. You can control this behavior by re-ordering the factor levels as below:
# re orderexp_data_all$C_v_f =factor (exp_data_all$C_v, levels =c('w','m','g','b'))# the new order is evidentlevels (exp_data_all$C_v_f)## [1] "w" "m" "g" "b"# note that 'm' is now the second categoryxtabs ( ~ exp_data_all$C_v + exp_data_all$C_v_f)## exp_data_all$C_v_f## exp_data_all$C_v w m g b## b 0 0 0 810## g 0 0 570 0## m 0 1350 0 0## w 1440 0 0 0
Although our factors seem to have an ‘order’ this is only because items can only be discussed and presented one at a time, and so there must be some order in our nominal variables at some level of organization. For example, when presenting effects and plotting figures, you literally do have to decide to show one effect first and another second. However, the ordering of factors is exchangeable meaning it does not in any way affect our analysis. For example, the listeners and speakers in our experiment received unique numbers. However, listener 1 is not the listener who ‘most’ has the quality of ‘listener’, and speaker 8 is not twice the ‘speaker’ (or anything else) that speaker 4 is. In other words, although we must commit to some order in our factors to organize our data, this ordering is arbitrary and not meaningful.
There is a special kind of categorical variable, called an ordinal variable, where the ordering of the categories is meaningful. These variables are halfway between numbers and labels: They faithfully represent the order (rank) of categories but not the magnitude of the difference between values. For example, consider the first-, second-, and third-place runners in a race. These are ordinal labels. You know who finished before/after who but don’t know anything about how much of a difference there was between the runners. As a result, these variables seem to have some of the properties of numbers, while not being totally like ‘real’ numbers. We will discuss the prediction of ordinal dependent variables in more detail in Chapter 12.
1.4.4 Quantitative variables
Unlike nominal variables, quantitative variables let us represent the relative ordering of different observations and the relative differences between them. Some examples of quantitative variables are time, frequency, and weight. In our experiment data, height is a quantitative dependent variable, and f0, vtl, and dur are quantitative independent variables.
A distinction is made between continuous and discrete quantitative variables. Continuous variables have infinitely small spaces between adjacent elements (like real numbers), at least in principle. On the other hand, discrete variables have gaps between the possible values of the variable, like integers. For example, things like time are naturally continuous, while things like counts are naturally discrete.
When we use a quantitative variable as our dependent variable, there is usually the expectation that it is continuous rather than discrete. However, in practice all measures stored on computers are discrete and many continuous values (e.g. reaction times) can be measured with maximal precision, resulting in discrete values. For example, a chronometer that measures reaction times to the millisecond contains only 1,000 possible values between zero and one second. Similarly, human height is difficult to measure up to much less than a centimeter of precision, making height measurements effectively discrete. Below are some more questions that will help you decide if you should treat a variable as quantitative, even if it is discrete:
Is the variable on a ratio or interval scale? This is a prerequisite for a quantitative value to be used as a dependent variable. An interval scale means that differences between values are meaningful, and a ratio scale additionally means that 0 is meaningful.
Is the underlying value continuous? Many variables are discrete in practice due to limitations in measurement. However, if the underlying value is continuous (e.g. height, time), then this can motivate treating the measurement as a quantitative dependent variable since fractional values ‘make sense’. For example, even if you measure time only down to the nearest millisecond, a value of 0.5 milliseconds is possible and interpretable. In contrast, a value of 0.5 people is not.
Are there a large number (>50) of possible values the measured variable can take? For example, a die can only take on 6 quantitative values, which is not enough.
Are most/all of the observed values far from their bounds? Human height does not really get much smaller than about 50 cm and longer than about 220 cm, so it is technically bounded. However, in most cases, our observations are expected to not be away from these boundaries.
If you answered yes to all or most of these questions, it is probably ok to treat your discrete variable as if it were quantitative, though this determination really needs to be made on a case-by-case basis.
1.4.5 Logical variables
Before finishing with variables, we need to talk about one type that does not appear in our data, but that will come up often. These are referred to as Boolean variables in many other situations; however, they are referred to as logical variables in R. Logical variables can only take one of two values: TRUE and FALSE. Below we use two equal signs to test for the equality of two values, and != to check for inequality. Notice that we can check for the equality of numbers or characters.
One useful fact is that the logical values of TRUE and FALSE have numeric values of 1 and 0, as seen below. In each case, TRUE is equal to 1 so the expression evaluates to 2.
TRUE+1## [1] 2(2==2) +1## [1] 2
When logical operators are applied to vectors, the operation is evaluated for each element of the vector, as below, and a vector of logical values is returned. When combined with the numeric values of logical variables, this means that we can easily calculate the number of times a certain condition was met in the vector.
# are the values less than or equal to 3?c(1,2,3,4,5,6,7,8,9,10) <=3## [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Below, we find whether each element of the vector is greater then or equal to three. This results in a vector of logical values equivalent to a vector of ones and zeros. When we find the sum of the vector of logical values, we find the number of times in which the condition was met. Below, we see that three of the elements in this vector satisfy our condition.
There is one other very important use for vectors of logical values, and this is to extract subsets of your data that meet certain conditions. Below we create a vector of logical values that indicate whether the f0 for a trial is below 175 Hz or not. We can see that this vector has 4,170 elements, one for every row in our data, and that 1,290 trials satisfied our condition. This is nothing more than a bigger version of the same process we just carried out above with our logical_vector.
Recall that we can access individual rows of our data frames, that is the individual observations of our data, by placing this information before a comma inside brackets following the name of the data frame (as seen below). When we use a logical vector in this way, the effect is to include every row that equals TRUE and to omit every row that equals FALSE in the vector. Below we use our f0_idx vector to create a new data frame called low_f0 containing only productions with f0 below 175 Hz.
# get only rows where f0 < 175, i.e. where f0_idx is TRUElow_f0 = exp_data_all[f0_idx,]nrow(low_f0)## [1] 1290max(low_f0$f0)## [1] 172
We can use the ! operator, which basically means ‘not’, to flip each TRUE to FALSE (and vice versa). When f0_idx is flipped to select a subset of a data frame, the result is to select those rows where speaker f0 is greater than or equal to 175 Hz.
# get only rows where f0 >= 175, i.e. where f0_idx is FALSEhigh_f0 = exp_data_all[!f0_idx,]nrow(high_f0)## [1] 2880min(high_f0$f0)## [1] 175
1.5 Inspecting our data
After running an experiment but before your statistical analysis, you should inspect the patterns in your data. This gives you an opportunity to make sure the data has the characteristics you expect, and that there weren’t any errors during the collection of your data or with the design of your experiment.
1.5.1 Inspecting categorical variables
One of the most useful functions for understanding the distribution of categorical variables is the table function. This function makes a cross-tabulation (or contingency table) of the variables passed to the function. If a single factor is passed, the function returns the number of times each level of the factor is found in the data. Since each of our listeners listened to 278 stimuli, we expect that each level of the factor L (representing listeners) will appear 278 times in our data, confirmed below.
We can use this approach to confirm basic expectations about our data and to rule out problems with the design of the experiment. This is always a good idea since mistakes happen, and sometimes only get noticed when attempting to process the data. For example, if any of the levels above appeared more than or fewer than 278 times, we would have a problem.
We can also provide two (or more) factors at a time and the table function will return counts for every combination of factor levels. The table below reflects the fact that each listener heard 54 boys, 38 girls, 90 men, and 96 women, for a total of 278 responses per listener. When you provide multiple factors to table, it will vary the first factor along the rows of the table and the second factor along the columns of the table. If a third factor is provided, it makes a different table for factors one and two, for each level of factor three. More and more factors can be provided to the function, but these tables become harder and harder to work with.
Below we see that unlike our veridical categories, the distribution of apparent speaker categories varies across listeners. This is because the equal distribution of speakers for each listener is an aspect of the experimental design. However, the manner in which listeners interpreted each voice, whether they thought it sounded like a boy or girl, for example, may vary across individual listeners.
We can visualize relationships between categorical variables using a mosaic plot. In Figure 1.2, we see mosaic plots representing the two tables shown immediately above. Mosaic plots use rectangles of different sizes to reflect the relative frequencies of different combinations of categorical variables. In the left mosaic plot, we see that the size of the rectangle for each category is identical across listeners. This tells us these variables do not affect each other: Changing the listener does not affect the distribution of veridical speaker category in any way. In contrast, the distribution of apparent speaker category is affected by the listener and this is shown in the right plot where columns differ randomly from each other.
show plot code
par (mfrow =c(1,2), mar=c(2,2,1,.5))plot (t(table (exp_data_all$C_v, exp_data_all$L)), main ='',xlab='Listener', col = bmmb::cols[2:5],ylab ='Veridical Speaker Class')plot(t(table (exp_data_all$C, exp_data_all$L)), main ='',xlab='Listener', col = bmmb::cols[2:5],ylab ='Apparent Speaker Class')
Below we make a three-dimensional table and inspect the table and each dimension. Notice that to index the table along the third dimension, we need to add two commas inside the brackets.
When we plot the relationship between apparent speaker class, listener, and resonance (i.e. actual vs. big), we see a three-way relationship between the variables (Figure 1.3). First, we see that the chances of observing different speaker categorizations depend on the listener. Second, we see that the chances of observing each category depend on resonance. And third, we see that the effect of resonance potentially affects each listener in a somewhat different way. The first chapters of this book will focus on understanding patterns in continuous variables. However, we will discuss the prediction and modeling of categorical dependent variables beginning in Chapter 10.
show plot code
tmp_tab =table (exp_data_all$C, exp_data_all$L, exp_data_all$R)par (mfrow =c(1,2), mar=c(2,2,2.5,0.5))plot (t(tmp_tab[,,1]), main ='Actual Resonance',xlab='Listener', col = bmmb::cols[2:5],ylab ='Apparent Speaker Class')plot(t(tmp_tab[,,2]), main ='Big Resonance',xlab='Listener', col = bmmb::cols[2:5],ylab ='Apparent Speaker Class')
1.5.2 Inspecting quantitative variables
Using R, we can easily find useful information about any quantitative variable. Below, we calculate the sample mean, the number of observations, the sample standard deviation, and some important quantiles for our speaker height judgments. The quantiles of a set of observations are values such that a given percentage of observations fall above and below the value. Quantiles are found by finding cutoff values that are greater than x% of the sampled values. For example, the 50% quantile, also called the median, is the value such that 50% of the distribution is below it, and the 25% quantile (the first quartile) is the value such that 25% of the distribution is below it.
# calculate the meanmean (exp_data_all$height)## [1] 162.8# find the number of observationslength (exp_data_all$height)## [1] 4170# find quantilesquantile (exp_data_all$height)## 0% 25% 50% 75% 100% ## 106.7 154.9 164.8 173.5 198.1
We can use this information to make some basic, and potentially useful, statements about our data. The mean and median are 162.8 and 164.8 cm, respectively, and height values range from 106.7 to 198.1 cm. However, there are not many observations at the extremes, and 50% of values are between 154.9 and 173.5 cm. We know this because these are the values of the first and third quartiles, the 25% quantiles that divide our distribution into four equal parts. Since 75−25=50, we know that 50% of the distribution of observations must fall inside of these boundaries.
We can look at the distribution of apparent height judgments in several ways, as seen in Figure 1.4. In the top row, each point indicates an individual production. Points are jittered (randomly shifted) along the y-axis to make them easier to distinguish so that dense and sparse locations can be compared.
show plot code
mens_height = exp_data_all$height[exp_data_all$C_v=='m']set.seed(7)par (mfcol =c(3,2), mar =c(0.5,4,0.5,1), oma =c(4,0,2,0))plot (mens_height, jitter (rep(1,length(mens_height))), xlim =c(100, 210), ylim =c(.95,1.05), yaxt='n',ylab='', pch =16, col = yellow, xaxt='n')mtext (side=3, outer=FALSE, text="Adult male apparent heights", line=1)boxplot (mens_height, horizontal =TRUE, ylim =c(100, 210), col = coral,xaxt='n')hist (mens_height,main="", col = teal, xlim =c(100, 210),breaks=40,cex.lab=1.3, cex.axis=1.3, xlab ="")mtext (side =1, outer =FALSE, text ="Apparent height (cm)", line =3)box()plot (exp_data_all$height, jitter (rep(1,length(exp_data_all$height))), xlim =c(100, 210), ylim =c(.95,1.05),yaxt='n',ylab='', pch =16, col = yellow, xaxt='n')mtext (side =3, outer =FALSE, text ="All apparent heights", line =1)boxplot (exp_data_all$height, horizontal =TRUE, ylim =c(100, 210), col = coral,xaxt='n')hist (exp_data_all$height,main="", col = teal, xlim =c(100, 210),breaks=40,cex.lab=1.3,cex.axis=1.3,xlab ="")mtext (side =1, outer =FALSE, text ="Apparent height (cm)", line =3)box()
In the middle row, we see a box plot of the same data. The edges of the box correspond to the 25% and 75% quantiles of the distribution, and the line in the middle of it corresponds to the median. So, the box spans the interquartile range of your observations and 50% of observations are contained in the box. The boxplot whiskers extend from the edge of the boxplots. By default, these extend out 1.5 times the interquartile range. These whiskers are simply intended to give you an estimate of the amount of ‘typical’ variation in your sample. Beyond the whiskers, we see individual outliers, points considered to be substantially different from the rest of the sample. We can see that the boxplot does a good job of summarizing the information in the top plots, and provides information related to both average apparent height values and to the expected variability in these values.
The bottom row presents what is known as a histogram of the same data. The histogram divides the x-axis into a set of discrete sections (‘bins’) and gives you the count (or frequency) of observations in each bin. Bins with lots of observations are relatively taller (more dense) than bins with fewer observations in them. As a result, histograms can be used to summarize where observations tend to be. For example, we can see that the bins under the interquartile range have the most observations, and that values further from the median value become increasingly less frequent. In addition, histograms can provide us with information that boxplots can’t. For example, in the right column, we see that our distribution of height judgments actually has a little gap at around 150 cm. This information does not really come across in the boxplot representation of the same data.
Scatter plots are plots that represent two quantitative variables at a time using a set of points on a coordinate space. Each point represents a single observation, the x-axis location represents the value of one variable, and the y-axis location represents the value of the other variable. Scatter plots are useful to understand relationships between quantitative predictors. Below we consider the relationships between our quantitative predictors using a pairs plot (pairs). A pairs plot creates scatter plots for all pairs of quantitative variables provided, resulting in \(n^2\) plots for \(n\) variables. Each plot in (Figure 1.5) contains a single point for each different stimulus used in this experiment (height values represent averages across all listeners).
In the plot above, we can see several apparent relationships between our quantitative variables. For example, pitch (f0) and vocal tract length (vtl) are negatively related. This means that as the value of f0 increases (left to right), the value of vtl decreases (top to bottom). In other words, if the f0-vtl relationship were a hill it would have a negative, decreasing, slope. In contrast, we see that height and vtl enter into a positive relationship: As you increase vtl, height also increases. Finally, we see that duration (dur) and height do not seem to have much of a relationship. Unlike the other two scatterplots which looked a bit like ramps or lines, the scatter plot of dur and height resembles a Rorschach test inkblot. This suggests either that these two variables are not strongly related, or that the nature of the relationship is more complicated than what can be understood using these simple plots.
show plot code
agg_data =aggregate (height ~ f0 + vtl + dur, data = exp_data_all, FUN=mean)pairs (agg_data, col = lavender,pch=16)
1.5.3 Exploring continuous and categorical variables together
We can also consider the relationships between our quantitative and categorical variables. We can use the boxplot function as below:
boxplot (y ~ factor)
to make a set of boxplots for the variable y. The function call above will create a plot with a separate box for each level of the factor in the function call. In Figure 1.6, we see different quantitative variables organized according to veridical speaker category. For example, the left plot shows the distribution of observations of f0 for boys, girls, men, and women, respectively. In this case, the differences between the boxplots for each level of the factor tell us about the values of f0 usually observed for speakers in that category.
show plot code
par (mfrow =c(1,3), mar =c(3,4.2,1,1))boxplot (f0 ~ C_v, data = exp_data_all, col = bmmb::cols[1:4],cex.lab=1.3,cex.axis=1.3,xlab="",ylab="f0 (Hz)")boxplot (vtl ~ C_v, data = exp_data_all, col = bmmb::cols[5:8],cex.lab=1.3,cex.axis=1.3,xlab="",ylab="Vocal-tract Length (cm)")boxplot (dur ~ C_v, data = exp_data_all, col = bmmb::cols[9:12],cex.lab=1.3,cex.axis=1.3,xlab="",ylab="Duration (ms)")
Another way to think of the relationships between our categorical and quantitative variables is using scatter plots with different points, as in Figure 1.7. In the scatter plots below, each point indicates a single speaker from our experiment, and the position of each point is determined by the f0, vocal tract, and average apparent height for the speaker. However, rather than using meaningless symbols, each point is labeled using a letter which indicates the veridical category that the speaker falls into. Using plots like the one below helps us understand the relationship between our acoustic predictors and our speaker categories. For example, it’s clear that adult males are fairly distinct acoustically compared to the other speaker categories. In addition, it seems that boys, girls, and women are easier to separate along the vocal tract length dimension than the f0 dimension.
show plot code
agg_data =aggregate (cbind(f0,vtl,dur,height)~C_v+S, data = exp_data_all, FUN=mean)par (mfrow =c(1,3), mar =c(4,4,1,1))plot (agg_data$vtl, agg_data$height, type ='n',ylab="Apparent height (cm)",xlab="Vocal-tract length (cm)",cex.lab=1.3,cex.axis=1.3)text (agg_data$vtl, agg_data$height, labels = agg_data$C_v, col = bmmb::cols[2:5][factor(agg_data$C_v)], cex =1.5)plot (agg_data$f0, agg_data$height, type ='n',xlab="f0 (Hz)",ylab="Apparent height (cm)",cex.lab=1.3,cex.axis=1.3)text (agg_data$f0, agg_data$height, labels = agg_data$C_v, col = bmmb::cols[2:5][factor(agg_data$C_v)], cex =1.5)plot (agg_data$vtl, agg_data$f0, type ='n',ylab="f0 (Hz)",xlab="Vocal-tract length (cm)", cex.lab=1.3,cex.axis=1.3)text (agg_data$vtl, agg_data$f0, labels = agg_data$C_v, col = bmmb::cols[2:5][factor(agg_data$C_v)], cex =1.5)
1.6 Exercises
The analyses in the main body of the text all involve only the unmodified ‘actual’ resonance level. Responses for the stimuli with the simulated ‘big’ resonance will be reserved for exercises throughout. You can get the ‘big’ resonance data in the exp_ex data frame, or all the data in the exp_data_all data frame. For now, it would be useful to compare the results in exp_data and exp_ex using the techniques covered in this chapter. The data is quite similar but not exactly the same, so familiarizing yourself with the apparent height variable in exp_ex, and any differences between exp_data and exp_ex, will help your analyses later on.
1.7 References
Fryar, C. D., Gu, Q., & Ogden, C. L. (2012).Anthropometric reference data for children and adults: United States, 2007–2010. Vital and Health Statistics. Series 11, Data from the National Health Survey, 252, 1–48.