Friday, August 14, 2009

IPhOD v1.4 (correcting an error in v1.3)

I recently found that IPhOD version 1.3 (Feb 2005) contained errors in the positional probability measures; it resulted in average positional probability calculations (columns 39-44) being 3-4 times higher than they should have been. As of August 14, 2009, all of the positional probability calculations in Version 1.4 have been corrected and updated throughout the IPhOD website, so whether you download a copy or search online, you will now be getting the corrected estimate of positional probability.

The good news is that the calculation error did not affect any of the other measures, and positional probability is not typically used independently of other measures.

Seems like this might be a good time to talk about positional probability a little more. Positional probability is a measure that is often used to help control or manipulate sublexical processing, along with biphoneme probability. The measure is calculated by counting the number of times that a phoneme occurs in a specific position, then dividing by the counts of all phonemes in that position. So in the word "cat", P(K,1), P(AE,2), and P(T,3) are the positional probabilities that must be determined. Once those are known, they are averaged to give a relative estimate of the typicality of a word's sounds in their respective positions.

In the example of "cat", P(K,1) = 0.094, P(AE,2) = 0.066, and P(T,3) = 0.062; so the average positional probability for the phonemes of "K.AE.T" is equal to 0.0739. Compare that to "hat", which has an average of 0.0551, and you can see that fewer words begin with H than K (since the other phonemes are identical and in the same positions).

Another interesting note is how the positional probabilities vary over phoneme positions. In the figures below, I am illustrating vowels versus consonants - and you can see that English words have a huge tendency to form CVC-patterns at the onset of words. Looking at the red arrow, you can see the V-shaped consonant probabilities in positions 1-3 and the mountain-shaped vowel probabilities distribution.


Positional probability also tends to become highly variable in later positions, something I noted in the last blog entry. You can see this pattern clearly in the consonants figure, as consonants in the later positions spike - on average. English words contain a lot of word-final consonants, which explains why the blue vowel line is gradually decreasing and the yellow cons line rises until the final spike.

Finally, the last bit of news: Version 2 of IPhOD is coming soon. This will represent a major overhaul of the database, including a new word frequency measure to take the place of Kucera-Francis written word frequency. It will be fascinating to see how the phonotactic and density measures change, or how much of a difference there will be when KF is no longer the basis for our estimates. Importantly, we expect the new measures to be more powerful predictors of behavior and brain activity. Shouldn't be too long now!

No comments:

Post a Comment