Tuesday, August 4, 2009

Making Changes?

Recently, I have heard suggestions for changes to the database and I wondered if this would have traction with readers. If you're interested in phonotactic research, but this database is missing some piece you consider critical - then let me know. (I'll see if I can add a nifty polling device to this blog for the question too.)

Q. What changes should we make to the IPhOD? (Why?)

Here are a couple of ideas that have really stuck with me so far:

1. Change: Kucera & Francis (1967) to something other frequencey metric; such as SUBTLEXus frequency (Brysbaert & New, in press) or a Google-based frequency (eg. Blair, Irene, Urland, Ma, 2002)?
Reason: KF is losing popularity in psycholinguistics as a measure of word frequency, it has a lot of baggage, and that makes KF-weighting more questionable. A switch could bring our frequency-weighted calculations up to date.
(RE: Mark Seidenberg comment on Talking Brains Blog)

2. Change: positional probability metric to be length constrained, as Vitevitch and Luce (2004) did with that measure computed by their Online Phonotactic Calculator.
Reason: As words get longer their average positional probability values start varying a lot. Since relatively fewer words of length 7-17 phonemes exist, probabilities are more variable in the later positions of long words. I would predict that this mainly affects longer words or pseudowords, but result in interesting changes for shorter items too.


Brysbaert, M., New, B. (In Press). Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. BRM.

Blair, Irene, Urland, Ma. (2002). Using Internet search engines to estimate word frequency. BRM (34), 286-290.

Vitevitch, M. S., Luce, P. A. (2004). A Web-based interface to calculate phonotactic probability for words and nonwords in English. BRM 36(3), 481–487.


  1. As someone who is in the midst of her dissertation and relatively new to these measures, I am glad to hear that there were errors in the positional probability output. It was hard to understand how I was getting such large differences from other phonotactic measures I was exploring.

    I look forward to the new IpHOD version 2. In my study I am using novel spoken words and need measures both of phonotactic probability and neighborhood density. I have not found another online tool that offers both of these measurements for American English, crucial for my study and I am sure for many studies in speech perception and production.

    You asked for suggestions. Currently in the output of the Transcription-Based Phonotactic and Density Values, I believe the output reflects sums of the individual positional phonotactic and of the biphoneme probabilties, whereas I'd find it helpful to also have the individual values.

    It will be interesting to see how the density measures and neighborhood lists change with the new version.

    Thank you for providing a very useful tool for researchers.

  2. Thanks Peggy. When the next version is ready to put online, I'll see how much code that requires to implement. That would allow researchers to double-check their values (if they could see the individual values that are being averaged). So I can see how that would be helpful, if the information can be arranged and displayed clearly.

    Good luck with your dissertation!