Spec Tech: Conlanging 2 – Sounds and Words

This is the second in a series of posts “live-blogging” the creation of a fictional language from scratch, with the help of our readers.  We plan to construct a functional language one piece at a time, incorporating suggestions and preferences from our audience along the way.  You can read the first post here.

From our discussions last time, it seems that we’re all in agreement that the language we’re constructing needs two things: first, sounds that are common to as many languages as possible; and second, words that are made from those sounds using relatively straightforward patterns.

For our consonants (Cs), then, I’m going to propose the following:

p f m
t s n l
k h ng

As was mentioned in our previous discussions, not every language has all of these sounds, and this isn’t a perfect system—Arabic, for example, lacks a distinction between /p/ and /f/, Hawai’ian doesn’t have /s/, etc.—but we’ve avoided some common problems.  We only have /l/, for example, and not both /l/ and /r/, which is a problem for speakers from a variety of language backgrounds.

All of these sounds are relatively common throughout the world, and since we obviously can’t accommodate every possible linguistic background in designing this language, I feel that this is a good compromise.

We also need to consider how these consonants are to be pronounced.  I’m going to propose that these consonants be pronounced essentially as in English, but with room for variation, since we’ll have individuals of different language backgrounds learning the language.  For example, Spanish /p/ and English /p/ aren’t pronounced in exactly the same way, but for our purposes, I think we should say that either is fine in this case.  After a few generations, I’m sure our descendants will settle on a preferred way of pronouncing these sounds, but until then, some variability should be fine.

As for the vowels (Vs), it’s a bit easier to come up with sounds that are found in a large number of languages.  The most common vowel system in the world, by far, is one which has the following five vowel sounds:

/i/ as in “bead”
/e/ as in “bed”
/a/ as in “bod”
/o/ as it “boat” (in most languages, this vowel lacks the off-glide of the English sound)
/u/ as in “boot”

However, there is always a trade-off between the number of sounds in a language and the length of the words in said language, with lower numbers of sounds leading to longer words.  Because of that, I’m going to propose that we also have four diphthongs—a vowel which changes from one vowel sound to another during the course of its pronunciation.

/ai/ as in “bite”
/ei/ as in “bait”
/oi/ as in “boy”
/au/ as in “bout”

Adding in these extra vowels will help us to keep our words relatively short, while also keeping the structure of those words simple.

Speaking of, we need to decide how these sounds can combine with each other to form words.  The most common type of syllable, cross-linguistically, is a consonant followed by a vowel — CV — so I propose that we use that as the basis of our words.  However, with such a small number of consonants and vowels, we’ll have to have words that are more than just CV.  Therefore, I think our words should have the structure CV(CV)(N), which is to say, a CV syllable which is followed optionally by either a nasal (m, n, or ng; represented as “N”) or another CV syllable, which can then also be followed by a nasal.  This gives us four forms for words: CV, CVN, CVCV, and CVCVN.  Note that, although nasals are a distinct category in that they can occur at the ends of words, they still count as consonants, and can also appear anywhere you see a “C” in the forms above.

Examples might help clarify a bit; all of the following are possible words:

ku kum ngaupo hangan
pa hain tali kufem
li pel sefa fisen

And that’s it!  Four basic word forms, without groups of consonants that can cause problems for speakers of a large number of linguistic backgrounds.  Using these templates for our words, with the consonants and vowels presented above, we get 32,760 possible words, which is more than enough for any language.

However, I’m going to propose that we might return to these word templates later and modify them slightly by considering the possibility that the sounds /f, s, l/ might also be able to occur at the ends of words in some circumstances (namely as case markers—we’ll talk about what that means when we get to talking about verbs).

One other thing: in human languages, the grammatical machinery of a sentence that holds everything together, words like the, and, he, what, etc., tend to be shorter than words with more meaning, like computer or derby.  Therefore, I think we should keep the CV words in our language for grammatical machinery-type meanings, drawing from them as needed when we start developing our vocabulary.

None of this is set in stone just yet, so if you have comments about any of this, please do chime in!

Now that we have our sounds and our templates for how to put those sounds into words, we can start thinking about assigning some meanings to certain strings of sounds—which is to say, start making our vocabulary.  I’m going to hold off on this for the moment, though, in case any of your feedback changes these sounds or our templates.

We’ll have to start coming up with words soon, though, as the task at our next meeting will be deciding how nouns work, and that will certainly be easier if we have some nouns to work with.  In thinking about nouns, there are a lot of dimensions to consider:

Number — Does our language have singular and plural forms?  Plurals are pretty common in European languages, but unheard of in much of the world.  There are also more fine-grained distinctions than can be made—dual for two things, trial for three things, etc.

Articles — English the and a, Spanish la and el, un and una, etc.  Again, we see these all over Europe, but they’re pretty rare elsewhere.  The exact circumstances of when to use which article is pretty complicated, although we could simplify it somewhat.  Simplify too much, though, and there isn’t much of a reason to have articles in the first place.

Gender — Spanish, French, German, etc. have different classes of nouns, called genders, which affect various aspects of the grammar.  Is this something we want in our language?  Note that the distinction doesn’t have to be one based on biological gender, as human languages have a number of other distinctions along the same lines.  We might have, for example, human, non-human animal, and inanimate (which is a fancy of way of saying “not human or animal”).

Possession — How do we show that a house belongs to someone, as in the English Mike’s house?

Adjectives — Do adjectives go before or after nouns?  Or somewhere else entirely?  Does our language have adjectives at all?  I’m personally rather partial to a system where adjectives are a special subset of verbs; instead of a word red, then, you get a word that means “is.red“.  This might lead to some interesting things when we have adjectives modifying nouns directly, as in phases like the red car.

Putting nouns together — Is there any special marking if we have one noun modifying another, as in computer paper or house cat?  If so, what does it look like?

Alright, I think that’s enough to think about until next time.  If you have comments, either about the sound system presented here, or about what to do with nouns, speak up!  We’re well on track to give the Qu’ssh!rrians what they’re looking for, but we still need as much input as possible from all of you.


10 thoughts on “Spec Tech: Conlanging 2 – Sounds and Words

  1. It seems a little odd to me to allow nasals at the end of words, but not also allowing them at the end of syllables within words (i.e., CVN as well as CVNCV).

    While it is true that the main European languages are particularly fond of definite articles, they are not actually rare elsewhere. Fully one third of human languages have some sort of definite article. (For people who aren’t used to thinking about what on earth “the” actually means — the definite article marks a discourse topic that has already been brought into the conversation. It helps us keep track of new vs. old information. Minor variations in use are also possible, but that’s the core meaning of the definite article.)

  2. I’ve been lurking in Clarion for some time already, but this is the post I’d really like to comment.

    Well, first of all, I like the choice of consonants. The only thing that may be problematic is that, say, French and Italian do not have “h”-sound, but it seems that we may cope with this problem. And Arabic people may use “b” for “p”, and still be understood by others.

    Here is, however, we hit the cornerstone of all the constructed languages. It is clear that Qu’ssh!rrians are totally flat about the humanity’s success or fail with developing a single language, as well as how much of the humans will be able to learn the new tongue. The question is: do we care? Since only the most common sounds are proposed for the upcoming Lingua Terrana, it seems that we do. No clicks, no melodic stress will find their place under the Qu’ssh!rrian sun.

    If indeed we’d like to save as many human beings as possible, then the grammar should also be as simple as possible. This means (ok, forgive me for speaking a sort of technically) separating analytic language. If we want as many people as possible to be able to learn the language, it should be very flexible in the first place.

    For instance, the articles. For people without articles in language it may be really hard to learn where should they put them. What is actually article. How to use it. From the other side, for the that speak languages with articles it may be really hard to abandon them. A Solomon’s decision may be to leave the use of article optional: we merge the definite article with the word for “this”, an indefinite with the word for “one”, and do not make their usage obligatory.

    Well, so far it may look nice. But bifferent languages may omit words, the need for which is obvious in the other languages.

    “He is a writer” – says an English speaking person. “He is writer” – and it is ungrammatical to put an indefinite article here for a German speaking one, since we refer to the profession. “Is writer” makes a perfect sense for a Spaniard. “He writer” – a Russian and an Arab are satisfied. “John he-writer” – yep, why should we put a verb when there should be a pronoun in Hebrew? “Writer is” – ave Caesar!

    Even if we fix a single word for “is”, the system already looks too flexible. Maybe we shall put some rules? But won’t these rules close the emergency path for the exotic constructions, that may be breathtakingly elegant and useful, but just do not have a luck to be “common” enough. Or, again, how much actually intelligible will these rules be? Don’t they look obvious for us only because we happen to understand them? And is a creation of an easy-to-learn language what Qu’ssh!rrians really want from us?

    Or maybe we should think about something different. Maybe we want to have some really interesting and weired stuff in our future language. Say, leave “yes” behind, like in Chinese? Invent number and articles for the verbs? Get rid of the verb “to be” totally? Or maybe make it so flexible, that it will incorporate “to have”, “to go”, “to do” and a bunch of others in itself? Such proposals may look inhuman, but maybe they can make Qu’ssh!rrians happy?

    I don’t know.But I’m still willing to save the heritage of the Mother Earth, at least in the form of a language.

  3. My preferences, as a non-linguist just applying basic logic. I am fine with the decisions made thus far. As to the things currently under discussion… Anything that isn’t needed for clarity is an unnecessary complication, and if millions of people have made themselves understood without it, I say we should do without it as well, to make the language easier to learn.

    Number – If many languages manage to get along without plurals, I say ours should as well.

    Articles – If many languages get by without them, ours should as well.

    Gender – I have never seen the functional point of noun gender, other than to give you one more thing to get wrong when learning a language.

    Possession – this should mimic whatever we choose to use for adjectives, as a possessive noun is pretty much another sort of adjective. Red car, fast car, Bob’s car, my car. Those words all do the basically same thing.

    Adjectives – Though my native language does not do this, logic tells me adjectives should come after nouns. The noun is the primary idea, the adjective is additional information. Why wait to find out that it’s my new, fast, red… (what) when we can hear first that it is a car, and then that it’s red, fast, and mine.

    Putting nouns together – same form as possession and adjectives. computer paper or house cat is really computer’s paper or house’s cat, if you think about it conceptually. So we should do the same thing to those nouns that we do to a possessive.

    Those are my ideas! Toss them in the pot.

  4. Those “unnecessary complications” are surprisingly robust in the world’s languages. Gender systems, for example, reimpose themselves on languages all the time. I think it’s important to remember that a little redundancy is good in a human language — we have to communicate over a noisy channel most of the time. A little extra ensures we get the intended message.

    That said, about half the world’s languages get by without gender (of any sort — not even different personal pronouns, just one word for “she, he, it”).

  5. As for word order, there’s some common patterns here that might help our earthlings acquire the language. The two most common patterns in Earth languages are: Subject Object Verb (es: Japanese) and Subject Verb Object (ex: Chinese, English). Other possibilities are of course out there. The next most common is Verb Subject Object (Hawaiian). It could be fun to choose this last one just to be different. There are languages in which the Object routinely comes before the Subject, but those seem quite rare and could possibly be more difficult to learn (though one might debate that).

    More importantly, once you decide on OV or VO word order, lots of other things typically (not always) follow.

    adjective (red house vs house red)
    determiner (this house vs house this)
    numeral (two houses vs houses two)
    possessor (my house vs house my)
    prep phrase (through tunnel vs tunnel through)
    relative clause (the by me built house vs the house built by me)

    The above bit are the two basic patterns. Often languages are either on the left or on the right. OV languages tend towards the pattern on the left, while VO languages veer to the right. English is mixed, so this is less obvious, but some languages follow the pattern perfectly.

    Another thing to think about is how important word order will be. Often, languages that have lots of cases for their nouns have pretty loose word order. If the word itself says if it’s a subject with a case ending, you don’t need to place it in some particular location. Languages that have little case endings (such as English or Chinese) have pretty strict word order.

  6. My most sincere apologies for this late reply, but life gave me a few challenges in the last few weeks (in-laws in town, father-in-law ends up in hospital – ok for now). But I did have some time to think about what might be some good ideas. So here’s my input:
    Consonants: p, t, k and m, n, ng are good choices (with the possible exception of “ng” in some positions), but I’m not so sure about the fricatives f, s, h. I agree with Nikolay that “h” may be a problem for some or even many speakers. I also wish to suggest that the distinction between “f” and “s” is sometimes difficult to hear (let alone the difficulty some have with “p” and “f”). I don’t know how many times I’ve had to ask “Is that ‘f’ as in ‘Frank’ or ‘s’ as in ‘Sam’?”. So I suggest keeping only one unvoiced fricative – “s”. Now, before you object that I am suggesting elimination of too many sounds/phonemes, I recommend adding the semivowels/glides “w” and “y” which would allow for greater comprehensibility and keep the number of consonant phonemes the same. In addition, we could add the glottal stop (the throat-catching sound in “uh-oh”). This would allow words that seem to start with vowels.
    I agree with the five-vowel basis for the language, but the diphthongs are all suspiciously English, and the off-glides of the diphthongs – if they aren’t actually the semivowels “w” and “y” – they are very close and will tend to move toward those semivowels anyway (another reason I would like to add these sounds). I suggest adding a length component to the vowels like Estonian, Finnish, and Hawai’ian have (which we could spell for our purposes here as “ii”, “ee”, “aa”, “oo”, “uu” or “í”, “é”, “á”, “ó”, “ú”). This length component could be combined with a tense/lax or stressed/unstressed or high tone/low tone contrast – whatever method best suits the learner (a little redundancy is a good thing in a language), and it would still be possible to understand the difference between “paapa”, and “papaa” however it is pronounced. So we could have ten vowels (5 long/high tone/tense/stressed and 5 short/low tone/lax/unstressed):

    short/low tone/lax/unstressed long/high tone/tense/stressed
    i u ii/í uu/ú
    e o ee/é oo/ó
    a aa/á

    And if we implement the above suggested modifications to the consonants, we get eleven consonants:

    p m w
    t s n y l
    k ng
    ‘ <—– (this is the symbol used in Hawai’ian and many other languages for the glottal stop).

    For the structure of the language, I agree that the CV(CV) pattern is probably as close to ideal as we are going to get for ease in learning to pronounce the language. I also like the nasal option at the end of words, but I agree with Wm Annis that it _does_ seem a bit odd that nasals are not allowed at the end of syllables within words. I propose allowing them, but only when they occur at the same place of articulation. In other words, "CVmpV", "CVntV", "CVnsV", and "CVngkV" would be allowed, but not *CVmkV, *CVnpV, *CVngtV, etc.
    I also would suggest that the semivowels "y", and "w" be allowed to appear after "a" and "aa" at the end of syllables (and perhaps allowing a few off-glide + nasal combinations like "-aym", "-ayn", "-awm" and "-awn"). The reason the other vowels/off-glide combinations are problematic is because with the other eight combinations there are definite questions about either the ease of learning to pronounce them (e.g. "iw", "ew", and "uy" which don't occur in English), or difficulty in hearing the difference between words ending in "pure" vowels and those ending in the off-glide (e.g. "ow" vs. "o", "uw" vs. "u", "iy" vs. "i", and "ey" vs. "e").

    So we could have the following structure:

    C=consonant, V=vowel, N=nasal, D=off-glide diphthong, ~=or,
    […] brackets are treated as a unit, (…) patterns within parentheses are optional


    C = p, t, k, ', m, n, s, w, y, l
    V = i, e, a, o, u
    K = p, t, k, ', m, n, ng, s, w, y, l, [mp], [nt], [ns], [ngk], ([ny], [mw])?
    W = ii, ee, aa, oo, uu
    D = ay, aw (oy)?
    P = aay, aaw (ooy)? Ever had rooibos (pronounced ROY-boss) tea?
    N = m, n, ng

    So we could have:

    'a say yom pentu kanton sempay talii 'utaang mayaaw
    tu naw wen lungki sa'on waway pa'uu satuum kampaay
    pe way ping tampa nilim yayaw kumaa mensiin tangkaaw

    You may have noticed that with this pattern W or P do not occur in single-syllable words or in the first syllable of two syllable words (so words like "*'aa", "*saay", "*ween", "*taampa", "*saa'on", "*waaway", "*paa'uu" etc. do not occur). After experimenting with allowing bi-syllabic words with both vowels being "short" or both vowels being "long", my conclusion is that if we implement the length distinction in vowels we should limit two syllable words to either the "long" vowel/"short" vowel or the "short" vowel/"long" vowel pattern. This would help to increase comprehensibility for the hearer, and aid the speaker in learning. For ease in writing, I suggest the first syllable in a two syllable word be by default "long" unless the second syllable is marked as "long" – something like how Spanish marks irregular stress. So every two syllable word will either be "long" on the first syllable and "short" on the second (by default), or "short" on the first and "long" on the second with the second syllable marked by a double vowel (or accent). I also feel we should not ask the learner or listener to distinguish monosyllabic words with only a length difference (this could be reserved for emphasis – not changing the core meaning of the word).

    1. Replace "h" and "f" with "y" and "w", and add the glottal stop.
    2. Add "long"/"short" distinction to vowels in two syllable words.
    3. Limit diphthongs to "ay", "aw", and perhaps "oy".
    4. Allow nasals to appear at the end of syllables within words, BUT only when followed by a consonant with the same place of articulation.

    That's all for now. Hopefully the Qu’ssh!rrians with their “more-grown understanding-state" will be uhh… understanding?

