Structuring online text database for Mongolic languages
This 2020 IRC project investigates the question of how the syntactic structure of a sentence can affect its pronunciation. As an outcome, we launched “Effects of Syntactic Constituency on Phonology and Phonetics of Tone” Digital Archive” in March 2021. This archive is a collection of text, audio, and plots of data from four typologically diverse languages: Basque, Xitsonga, Luganda, and Irish.
Basque is spoken in the Basque Country of Spain and France by about 530,000 speakers. Xitsonga is a Bantu language spoken in South Africa by about 2.28 million L1 speakers. Luganda is also a Bantu language spoken in Uganda by about 5.56 million L1 speakers. Irish is spoken in Ireland by about 140,000 speakers on a daily basis. It is mainly spoken in the counties Galway, Kerry, Cork and Donegal.
In these languages, the distribution of tone is known to be particularly revealing of the general role for syntactic constituent structure in determining the pronunciation of sentences. For example, the location of the left and/or right edges of phrase-sized constituents is indicated by tonal and other phonological phenomena (insertion of tones at phrase edges, lengthening of the penultimate vowel, etc.) Moreover, these languages are known to display phonological effects associated with a distinct, larger, clause-level type of constituent. Data from the pronunciation of sentences in these languages can thus reveal what types of syntactic phrases or syntactic clauses count as “domains” for phonology.
(Written by Daisuke SHINAGAWA and Mayumi ADACHI)
Digital archiving of text and sound data of morphosyntactic microvariation of southern Bantu languages