Posted by: Ambosc on May. 12, 2015, 03:24
Samoan/Maori, inaccurate score
Completely inaccurate score of 67.0 for Samoan and Maori. They are actually very closely related. The "r" and "l" in Polynesian languages are the same sound, which one turns up in the script was down to the whims of the first explorers to write them down. And the Samoan "g" is identical to Maori "ng"- so "taliga" and "taringa" are pronounced exactly the same. And Samoan "f"/Maori "wh" are identical too (fa/wha). In fact there are several identically pronounced pairs on there given scores of 0 or 50 (lua/rua, la/ra, igoa/ingoa, tolu/toru).
Just comparing consonants doesn't work well in these languages either, they have far fewer of them than most Indo-European ones. Look at isu/ihu, 2/3 the same, score of 0.
Posted by: Vincent on May. 12, 2015, 19:10
Thank you very much for this very interesting feedback.
Yes, you are right, 67 is a completely inappropriate score from an historical point of view. There is no reason why the Maori-Samoan score should be higher than, let's say, English to Frisian (27) or French to Italian (20). Even if these figures can not be strictly taken as glottochronological values, they should give a direction (+/- 5 to 10 for similar ages of pairs of languages).
If I correct the Maori-Samoan results with the "L-R" + "N*" issues, we get appr. 20 points less - and with the vowel issue another appr. 25. - so, we would be there - appr. 20... I will try it more in details this weekend.
The problem with the vowels is striking. I have a similar, even bigger problem with tonal languages like Chinese which I simply can't include in the study, as consonants don't play as a predominant role as in e.g. Indo-European or Uralo-Altaic languages. I must confess that the impact of vowels as in the Polynesian languages is new for me - I wasn't aware they could be so stable and powerfull for cognate scoring in some languages - although this is certainly limited to languages with relatively young common ancestors.
Although the system delivers consistent results overall, it has many limits, and also many languages with mistakes - I hope to correct this step by step and feedbacks like yours are of great help.
The L-R issue has been a great dilemma from the beginning: it is an obvious sound correspondence in many language families - and a well documented one as in Cecil H. Brown, Eric W. Holman and Soren Wichmann's study , but also a big source of statistic noise, because L and R together count for 16% of the consonants appearing in the 18 words of the study in all 186 languages.
When I exclude certain words or sound correspondences, I always exclude chances to identify more cognates, but also reduce the impact of chance... it is a trade-off and at the end, a very small sample - both lexical as for sound change - is the basis of the study - with all limitations and sources of errors this approach brings.
Thanks again for the inspiring feedback - I will "dig" further and may come back to you in the next days if you have time.
Best regards
Posted by: Ambosc on May. 13, 2015, 02:01
Thank you for your reply. It is very interesting to see the score recalculated and to hear of the problems l/r cause from the calculation point of view. You are correct that vowels are very important from a historical point of view in the Polynesian languages, as they all have the same vowel system (5 vowels a/e/i/o/u), which tend to be very consistent and stable, unlike the consonants. A clear example is the word for "woman": Samoan "fafine", Maori "wahine"- cognate, same vowels, completely different consonants.
If Hawaiian were ever to be included in the list of languages, it would give even more inaccurate scores with the current methodology. It is another very close relative (probably only 1000-1500 years separation from Maori), yet changes practically every consonant, often in unusual directions e.g. /t/ > /k/. However the vowels stay practically the same.
Posted by: Vincent on May. 17, 2015, 18:15
I have made the changes in the word list and updated the data online too, I didn't include the vowels issue as I have to stick to my principle of implementing the same rules for all language comparisons to keep the system fully automated. In fact, it is a partial system based only on consonant analysis. But even so, we now have a Samoan to Maori proximity of 20 instead of 67...
I have also found a few other mistakes: First, I have replaced "Laulaufaiva" with "Alelo" for "tongue" as it seems "Laulaufaiva" stands for "language" rather than tongue as a body part. I have also changed the word "Wind" from "Hau" to "Ta" - please come back to me if these changes are wrong.
But the biggest change is that I now do keep the "-L-/-R-" correspondence in the system. I have analized the "L-R cluster" statistic noise I wrote about a few days ago and it seems that not the "L-R" correspondence is the problem, but the other L and R related correspondences like L to ZH and so on. For now, I have added the sole L/R correspondence with the point numbers as converted from Cecil H. Brown, Eric W. Holman and Soren Wichmann's study. It does have the positive impact on the Polynesian classification, without significant negative impact elsewhere.
Thanks again for your remarks
