The theory and the method of constructing a true Universal Language

Copyright © July 2009 by Tienzen (Jeh-Tween) Gong
Presented as the keynote speech at 2009 Linguistic Conference
held from July 10 to 12, 2009 at City of Industry, California


Longing for a universal language is a dream of mankind since antiquity, such as the Biblical story of Babel. In the human history, many languages (such as, Greek, Latin, Arabic or English) claimed to be a universal language with the political or economical supremacy for a short period of time, especially in the area that its political power could reach. Nonetheless, a few languages do act as trans-national and trans-racial literary language, such as the Chinese written language in China, in Vietnam, Korea and Japan for centuries. However, there are, at least, two difficulties for any natural language to become a true universal language.

  1. No natural language is easy. Less than 10% of people can truly master their mother language to a scholastic level. In general, the difficulty of learning another natural language as a second language is about 10 times harder than learning the mother language. Thus, even if we all accepted politically that one particular natural language (such as, English) is the lingua franca, the illiteracy rate for this language would have still been higher than 85% worldwide.
  2. Just as all the de facto world languages owe their status to historical political supremacy, the suggestion of a given natural language as a universal language has strong political implications, and the major world powers will never be agreeing such an agreement. Thus, the best hope for a universal language, if ever possible, is by choosing an insignificant language or a constructed one, such as Esperanto.

With these realities, a universal language, if any, must be:


I. Criteria for Constructing a Universal Language


Thus, there should have some design criteria. And, I will list only two below:
  1. Criterion one (C1): Its scope and capacity must be in par, at least, with one natural language.
  2. Criterion two (C2): It must be mastered to a literacy level similar to the language skill of a 12th grader on his/her mother language by an average person in 100 days with 3 hours of study a day, that is, a total of 300 hours of study.

The verification of C2 is quite simple in principle. As soon as its construction is completed, a few volunteers can either confirm or disprove it. The major issue is of how to construct it.

The biggest difficulty of a language is the vocabulary, as the foundation of the vocabulary of the most of natural languages is practically arbitrary. Why are the four letters (L, O, V, E) that means love? There is no way of any kind that we can find out the meaning of "love" by dissection or decoding of those four letters. They mean "love" because of "You told me so!". Otherwise, the string "love" is just a blob. Thus, learning a language must learn thousands or even half a million of those blobs together with their "You told me so!", especially for someone who learns them as a second language without the benefit of being already able to speak those blobs. Of course, a pure (100%) root words system with all vocabulary that are composed of only from those root words (no exception), which is also self-revealing of those word meanings, can eliminate the above-stated vocabulary difficulty. Yet, this root word system idea is still having, at least, two difficulties:

  1. Can such a root-word vocabulary system be constructed? How to select those root words? How many roots must the system have? If the number of roots goes over one thousand, the benefit of a root word system will be significantly reduced.
  2. A language is much more than just vocabulary. A language becomes more difficult to learn when the learner must learn to make distinctions that he is not used to making. For a non-English speaker, it could be quite difficult for him if the grammar of this universal language (u-language) contains accusatives, mandatory tenses, tones, noun/adjective agreement, etc.. In this sense, the grammar of this u-language must encompass (or, not significantly different from) all grammars of different natural languages. Yet, can this be achieved?

If we cannot resolve these two difficulties, we probably can never pass the Criterion two (C2) with any constructed system. Yet, what is the guiding light for resolving these issues? Fortunately, we do know a fact. Norwegian is easy to a Swede because it is practically a mere dialect of his own language, while Norwegian is not easy in itself as it would be very difficult to an Oriental. The dialects of Chinese are mutually unintelligible, which would take even a gifted European at least three years to learn to speak one of them while it takes a Chinese person only about six months to learn another dialect. Thus, with this fact, if all natural languages are dialects of this u-language, then it can be learned in 300 hours of study by all different people who speak different mother languages. Of course, this is a big "IF." However, we can re-state the Criterion 2 as below:

Thus, a particular natural language (such as English) will never be a u-language in terms of this design criterion even if it became a practical world language because of its political and economical supremacy. With this RC2, such a u-language, if ever possible, will change the foundation of linguistics completely regardless of its being used as a lingua franca or not. Thus, the effort of researching such a u-language will not be in vain in all circumstances. The problem is that what our starting point for this research could be.

II. In Search of the Universal Mother Language

Guessing a postulate might be a good starting point.

If all natural languages must be dialects of this u-language, it must be the mother language of all those natural languages, that is, they are all grown out from the mother. Thus, in every baby language, it must consists of two parts, the part that is inherited from the mother and the part of some new growth (the bells and the whistles). Then, the task of constructing a u-language becomes a task of searching for the mother language of all natural languages.

Seemingly, the comparative linguistics could be of a great help on this task. However, the major interest of comparative linguistics is on the genetic relationship between languages that are members of the same language family, with the emphasis on phonological and the lexicon. Thus, there is not much to compare about between Arabic and Chinese on their lexicon and their phonology. Thus, the current study of comparative linguistics is of no use for our task of finding a mother language for Arabic, Chinese and English, if such a mother, indeed, exists. That is, we must invent a new methodology for this seemingly impossible task, and the best way of tackling this issue is the reverse-engineering.

If such a u-language (as the mother of all natural languages) does exist, it should be in every its baby language genetically, and we should be able to find its genetic codes from any one of its baby language, without doing any comparison between languages. If such a technique can be developed, I will call it "Begetting the mother from her baby" (or BMFB in short), and I am making the following proposal:

  1. The attributes of a natural language (such as, English) are listed as Ar(1), Ar(2), ..., Ar(n).
  2. If Ar(m) can be substituted with a different mechanism U(m) without any change to the system, U(m) will be put into a bag called "Mother bag" and Ar(m) will be placed into a bag called "Baby bag."
  3. If an Ar(x) cannot be substituted in any way, it will be placed into both bags.
  4. After we replaced all Ar(n) with U(n), if possible, we filled up two bags, the mother bag and the baby bag.
With this process, the originally selected natural language was never changed a bit, as its entirety is now in the baby bag. Yet, we did create a new bag, the mother bag, and it is a reasonable guess that the mother contains a u-language according to my assumption. In fact, with a mother bag on hand, it is not too hard to examine genetically of all other natural languages' genetic relationship with the mother. Now, our task of finding the u-language becomes to list all necessary attributes of a selected natural language, which is English as my choice.

Listing some major attributes of English language might not be a terribly difficult job. Yet, listing all necessary attributes of English exhaustively might not be an easy thing to do. After all, what are the necessary attributes of a language? Without knowing the answer of this question, we are as a blind man riding on a blind horse. Fortunately, there are a few toy languages (the formalized languages) which do constitute as language while their scopes are small enough for us to investigate their structure and all their necessary attributes in their entirety.

III. The Formalized Languages

The smallest toy language (formal system I) has only four symbols (an identity symbol =, and three individual constants, a1, a2, and a3). Although this System I is a genuine language system, it is too small of a system to convince the general public that it is, indeed, a language system.

a. A Syntactical System

Thus, I will select a toy language (language T, or simply named as T) which has an infinite number of symbols (vocabulary, etc.), and those symbols are divided into the following groups:
  1. An identity symbol, =
  2. Five connective symbols (logical constants), {no (negation), or (disjunction), and (conjunction), if...then (conditional), if and only if (biconditional)}
  3. Two parenthesis symbols, ( , )
  4. Two quantifier symbols, { for some, for all}
  5. Infinite number of individual symbols, which again are subdivided into two groups:
    • v1, v2, v3,..., as individual variables,
    • c1, c2, c3, ..., as individual constants.
Among those symbols, three relations arise: And those relations (linguistic units) are described with the following terminologies:
  1. "term" of T (language T) is either a variable or an individual constant.
  2. "formula" of T:
    • a predicate of T followed by a term is a formula of T.
    • any logical constant or quantifier together with a formula is also a formula of T.
  3. "sentence" of T is a formula of T in which no variable is free (undefined).
  4. "expression" of T is a linear string of symbols.
Furthermore, this language T is governed with two sets of rules:
  1. The formation rules -- how is the linguistic unit formed:
    • expression (a string): operation of concatenation.
    • subject - predicate structure.
    • propositions
    • indexical signs: personal pronoun, tensed verbs, etc..
  2. Rules of inference -- how is a linguistic unit read or how can it move around in T:
    • rule of symmetry
    • rule of transitivity
    • rule of detachment
    • rule of generalization
With these two sets of rules in place, every linguistic unit of T can be evaluated in terms of its true - false value. At this point, the language T is called a formalized language which is specified simply in terms of the formal relations among symbols, without any reference to meanings that might be attached to those symbols. In fact, this kind of language is called a Syntactical system. Terms, formulas and sentences are syntaxes (or tokens) of a syntactical system.

b. A Semantic System

Although this toy language T above is a genuine language, its scope is quite small in comparison to a natural language, as the main interest of any natural language is about the meaning of sentences. In a syntactical system, syntax, as only a symbol or a token, does have an innate meaning for itself while it has no extensional application in a sentence. How a syntax is used or applied in a sentence and how the meaning arises from an application belong to the field of semantics. In short, syntax concerns the truth-value of the formula while semantics concerns the meaning of the sentence. The linguistic definition of semantics is as below:

Well, if the readers are not able to understand this definition, it is not a big deal. Simply, semantics is the study of the concepts of meaning and truth about sentences. In linguistics, semantics is divided into two types:
  1. Descriptive semantics of natural language
  2. Pure semantics of the analytical study of formal language.
However, both types contain two theories: At here, we have no need of going into the details of those theories. Simply, every linguistic sentence has the followings:
  1. The sentence itself (the sentence token) -- being uttered or written as inked marks on a paper, it is composed of some symbols.
  2. The mental idea (the intention or the proposition) of the speaker -- which is suppose to be carried by this sentence token.
  3. The understanding of the speaker's proposition by a reader -- this requires a shared understanding of those symbols' denotation (its reference) and connotation (a meaning beyond its direct reference).
The easiest way of sharing a common understanding is by obeying a same set of rules, and the lesser the rules the better. Then, what is the minimum number of rules that we need for this communication purpose? This question is beyond the scope of this article. Yet, its central point is about proposition. What, then, is proposition?

Proposition is a position that a person holds on an issue or an object after his judgement (or an intentional act) on them. Yet, the linguistic proposition consists of two parts:

Linguistically, a proposition is expressed with three types of linguistic symbols:
  1. Subject -- the one who made this proposition
  2. Predicate -- a linguistic symbol that expresses the proposition act (judgement or intention)
  3. Object -- a linguistic symbol that points out the object which is the target of the proposition act
Then, the predicate is further divided into some sub-groups, such as: The mental idea (the propositional act) of a person is always private. Yet, the proposition itself is always public. A sentence itself is just a token (inked marks on a paper) while it acts as a vehicle or a bridge between the two, from private to public. Thus, with propositions (subjects, predicates and objects), a syntactic system acquires meanings for its sentences, and it now becomes a semantic system. A syntactic system concerns only of itself, its soundness and completeness. A semantic system concerns of the communication of two parties (the speaker and the reader) about some propositions which are always denoting to some objects (or events) and connoting with some meanings.

c: A Pragmatic System

By concerning only forms and their relations, a syntactic system is always timeless. A semantic system which is defined as above (with the meanings as the central issue) does not truly concern about spatiotemporal issues as most of the propositions are also timeless. Thus, the space-time position of a sentence must be dealt with a new mechanism, the pragmatics. Pragmatics is the study of formal languages containing indexical terms, such as, tensed verbs, pronouns, demonstrative, etc.. In fact, pragmatics is simply the extension of the semantical truth-definition to formal languages containing indexical terms, for the truth-value of a sentence for relating to both the person asserting the sentence and his space-time position.

d: All Necessary Attributes of a Language

Now, this toy language T can be clearly and definitely described as consisting of the followings:

  1. A syntactic system:
    • a list of symbols:
      • logic symbols:
        • one identity symbol, =
        • five connective symbols
        • two quantify symbols
        • two parenthesis symbols
      • infinite number of individual symbols:
        • individual variables
        • individual constants
    • Formation rules (terms, formulas, sentence, ...)
    • Rules of inference (for truth-value of sentences)
  2. A semantic system (propositions, subjects, predicates, objects, etc.)
  3. A pragmatic system (indexical signs -- tensed verbs, pronouns, demonstrative, etc.)
In fact, these are all the necessary attributes for a language. Linguistically, the above structure can be re-arranged as follows:
  1. Grammar
  2. Rules of inference
That is, grammar encompasses the entire language system (a list of symbols, formation rules, semantics and pragmatics) except the rules of inference.

However, there is a significant difference between a natural language and this toy language T. The following sentences are non-sense and meaningless in T while they could be very meaningful in a natural language.

  1. Type one -- tautological
    • Now is now. (non-sense in T)
    • When is the best time to do it? Now, now is now. (meaningful in natural language)
  2. Type two -- illogic
    • Red is green. (false and non-sense in T)
    • When red is green, the Sun will rise up from West. (meaningful in natural language)
  3. There are many more such examples.
In conclusion, although language T is a fullfledged language system, its scope is much, much smaller than a natural language. Yet, many linguists view the fact that natural language tolerates those illogical and false propositions as a defect in comparison to the language T which is viewed as an ideal language. At here, I am not interested in arguing this issue with them. Defect or not, it is an addition to and above the language T. I call this addition (or defect) "fictitious machine." Then, we can describe the structure of a natural language as the composite of followings: And, it can be re-written as below, a natural language consists of:
  1. Grammar
  2. Rules of inference
  3. F - machine

IV. Begetting the Mother

With the clear understanding the structure of a natural language, we are now able to apply the BMFB procedure for constructing a universal language (u-language).

First, I am guessing that the rules of inference and the F-machine are universal, and they will be placed into both bags, the mother bag and the baby bag.

Then, the issue becomes to investigate the grammar of a selected natural language.

a: English Grammatic Structure


In my case, English is my choice of candidate for finding the Universal Mother Language with the BMFB procedure, and the English grammar can be outlined as below:
  1. List of symbols:
    • inflected vocabulary
    • a set of punctuation marks
  2. Formulation rules:
    • word order -- a word string from concatenation
    • Subject - predicate
      • Descriptive
        • active
        • passive
      • Subjunctive
      • Exclamatory
  3. Semantics -- Propositions (subjects, predicates, objects, accusatives, etc.)
  4. Pragmatics -- indexical terms (tensed verbs, pronouns, demonstrative)
In fact, the English grammar is almost identical to the grammar of language T. In the book The Divine Constitution (Library of Congress Catalog Card number 91-90780), it wrote, "... Not surprisingly, there are two types of human language, which indeed are evolved from these two distinguishable aspects of God's language. The one is perceptual language, the other conceptual language.

"English is a good example of a perceptual language. In English, there are many grammatical rules: such as tense, subject-predicate structure, parts of speech, numbers, etc.. The purpose of tense is to record and to express the real time. The subject-predicate structure is for relating the relationship between time and space of events or things and to distinguish the knower from the known or the doer from the act. The parts of speech are trying to clarify the real time sequences and the relationship of real space or the relationships of their derivatives. In other words, English is a real time language, a perceptual language.

"On the contrary, Chinese is a conceptual language. There is no tense in Chinese. All events can be discussed in the conceptual level. The time sequence can be marked by time marks. Therefore, there is no reason to change the word form for identifying the time sequence. Thus, there is no subject-predicate structure in Chinese, because there are no real verbs. All actions can be expressed in noun form when they are transcended from time and space. There is no need to have parts of speech in Chinese." (page 71)

b: the Action Nouns

With the hint of this quote, my first choice will be substituting the entire verb class. In English, the pronoun, proper noun and common noun not only are different grammatically but are different on the metaphysical and the ontological level. Yet, they are all nouns. Why can we not have the action nouns? As the BMFB procedure is for substituting, no subtraction nor addition, I would like to try to substitute the entire English verb class with the following procedure. The substituted sentence is a bit awkward while it is still grammatically corrected in English. Thus, these three new parts (three new verbs, all English verb-nouns and a special sentence pattern) are put into the mother bag while the entire English verb class (without any subtraction or addition) is placed into the baby bag.

c: Paired Sentence Structure

In English grammar, do, be and not are not true verbs. We might be loosing the tense structure with the above substitution. That is, we need one additional mechanism to preserve the tensed structure. In fact, we can use a pair-mechanism as below to preserve the tensed structure.
Sentence A = (Part 1, Part 2)
Part 1 is the body of the sentence, as S-body. Part 2 is the grammar tag, as S-tag, such as: Seemingly, this substitution is even more awkward than the first one, at least on a human level. However, the substitution is exact without any subtraction or addition, and it can simply be reversed with a simple algorithm. Again, I will put this paired sentence structure (S-body, S-tag) into the mother bag, and the original tensed structure into the baby bag.

However, an English sentence can be much more complicated than the above example, such as:
If I had had time, I would have owned four dogs.
This sentence can be substituted as (If I have time, I own four dog; S-tag). Of course, this S-tag will contain more information. The S-tag can have many fields, S-tag = (a, b, c, d, ...), such as: A table of S-tag can be mapped out to cover the entire English grammar. Now, this S-tag becomes quite complicated, and itself becomes a multi-dimensional vector. Fortunately, the S-tag can be systemized. Superficially, this kind of substitution is not only awkward but is kind of dumb. However, anything can be systemized should become a job of computer. And, we should concentrate on the part that cannot be handled by the computer, and that part could be the essence of the grammar of a u-language. Again, I put the paired-sentence structure together with a table of S-tag into the mother bag, and the entire English grammar into the baby bag.

d: b-words and i-words

Fortunately, we are seemingly able to reduce the complexity of the S-tag table by replacing the inflected vocabulary with non-inflected ones. I am choosing a paired structure again on this task. Every English word is divided into two parts, the body of the word and the tail of the word.
English word = (w-body, w-tail)
The w-tail is the inflection of the word, such as, -ive, -ly, -ion, -ed, -s, -ness, etc.. And, all irregular inflection will be eliminated, such as, (good, better, best) will become (good, gooder, goodest). With this substitution, English words are divided into two groups. Again, I place the paired-words (both i-words and b-words) into the mother bag and all English vocabulary into the baby bag.

If we do not have any more substitution to be made, we put the remaining parts into both bags. In this way, the baby bag is the entire English system (the list of symbols, grammar, semantics, etc.) without one bit of subtraction or addition. The mother bag is, in fact, having identical parts of the baby bag while some of those parts have being substituted. Yet, these two bags are still structurally identically.

e: Word-phrase

In the future, someone might be coming up some more substitutions. At here, I would like to make one last attempt, replacing the rule of word order. For three simple words, the following sentences are significantly different in their meanings. However, the power of this word order can be removed or greatly reduced with a technique of word-binding or word-phrasing. When we make "love I' into a word phrase love-I, then these three words can no longer create any ambiguity. The following sentences must have the same meaning. Of course, this issue will become more complicated when the number of words increases in a sentence. When the number is five, this five word sentence could have three meanings.
  1. a unique meaning
  2. an array of 5! (five factorial = 120) combinations
  3. a Google outcome. With a Google data base, these five words can produce a big google outcome.
However, linguistically, we are only interested in its unique meaning. Traditionally, it is accomplished with grammar; the word order, the subject-predicate structure, the inflected vocabulary, etc.. However, by using the word-phrase technique, we can easily reduce the number of free-radicals of this five word sentence to three or less, and we can zero in its unique meaning by the repeated use of the same method. In fact, this word-phrase method can very neatly zero in a word string to a unique meaning with only two phrasing tools (the hyphen and the parenthesis). For example:
I am going to school tomorrow while you are not.
can be identically expressed with the following word-phrases.
(I, go-school), you-not, tomorrow.
Those six words become three free word-phrase radicals with two phrasing methods. Regardless of the sequential order, these three phrase radicals above cannot produce any meaning other than "(I, go-school), you-not, tomorrow", although some other sequences can be quite awkward initially.

Now, I am putting the word-phrase method into the mother bag and the unchanged English grammar into the baby bag. That is, we will use this new word-phrase method in any sentence as much as we can before calling a help from the English grammar. Nonetheless, we will fall back to English grammar if we have to.

V. Universal (Mother Proper)

As there is nothing changed in the baby bag, it has nothing to be reviewed. However, it is the time to see what kind of harvest that we have in the mother bag.
  1. For vocabulary:
    • i-words and b-words, paired word structure
    • transformed all verbs into action-nouns with three new verbs (do, be, not)
  2. For sentence:
    • paired-sentence structure (S-body, S-tag)
    • word-phrase method to reduce the power of word order

Now, if we choose the mother bag English as the u-language, the criterion one (C1) has been met automatically as the mother bag is identical to the natural English (the baby bag) structurally. The only differences are some English grammar which are mechanized, that is, jobs are done by a formalized grammar table and a machine. For example, a sentence of the mother bag below,

will be printed out as a natural English sentence as below,
If I had had money, I would have had 10 houses.

However, can this u-language meet the criterion two (C2)? Seemingly, it can be learned by an English speaking person in days as it is a true dialect of English. Yet, can a Chinese who knows not a single English word learn it in three months, as required by the C2? This new language is obviously much easier than the original English, at least, in the following areas:
  1. Most of English grammar is formalized as a table which can be learned in one or two days. The learner does not need to apply those English grammar word by word in a sentence but chooses a S-tag from the table and places it at the end of the sentence. Then, a computer can print out a proper English sentence if he chooses to do so.
  2. For inflected words, only the noun form is required in this u-language. All the verbs are treated as action-nouns. That is, the required vocabulary for this u-language is about 10% from the original English, which is 90% reduction. However, can this reduction enough for this u-language meeting the C2 for all the non-English speaking people?

In my personal experience, if the reduced number of vocabulary is over one thousand, the average person, in general, cannot digest them in 300 hours of study. And, I think that one thousand words might not be enough for any language to meet the C1 requirement. Then, this mother bag English might still not be the u-language that we are searching for. Fortunately, we have two more chances to find the true u-language. Can method 2 be possible? The "mother bag English" is, of course, a dialect of the natural English for the fact that they are identical to each other by definition. In fact, we can use the same BMFB procedure to find the "mother bag Russian", "mother bag German", "mother bag Chinese", etc.. Then, we are hoping to find a universal mother for all those mother bags. Again, if the universal mother should be in all mother bags, it should be in the "mother bag English." Then, there is no reason of trying to find it in any other place.

a: Finding the U (mother proper)

The mother bag English has the following parts:
  1. For vocabulary:
    • i-words and b-words, paired word structure
    • transformed all verbs into action-nouns with three new verbs (do, be, not)
  2. For sentence:
    • paired-sentence structure (S-body, S-tag)
    • word-phrase method to reduce the power of word order
As I can simply try again if I guessed wrong, guessing is much easier than searching. So, I will construct the Universal (mother proper) as follow, by guessing first:
  1. For vocabulary:
    • There are only b-words, no i-words, nor verbs. All verbs are b-words in the mother proper.
    • All (100%) b-words of English will be replaced with words which are composed of from only 240 root words as root-word strings. These 240 root words are not English but are specially designed for the universal language.
      Note: The words of many natural languages are patterns of temporally ordered sound types, and meaning of a word does not attach to particular activities, sound, marks on paper, or anything else with a definite spatiotemporal locus. The meaning of those words is agreed by a linguistic community. That is, it will take a great effort to learn those words. On the contrary, the meaning of all b-words of this Universal (Mother Proper) can be read out from the string of the root-words.
  2. For sentence:
    • All (100%) formation rules of language T or English (word order, subject-predicate, etc.) will not be used. The only formation rule is word-phrasing of b-words with hyphen and parenthesis.
And, this is it, the Universal (Mother Proper). With this mother proper and mother bag English, we can now construct a U (English), which is a dialect of the U (mother proper), with the following procedure. And, this is the U (English). Now, we have four languages for English.
  1. Beginning with the natural language of English
  2. From the natural language of English, we get mother bag English.
    Natural English = mother bag English (structurally identical)
  3. From the mother bag English, we get the Universal (Mother Proper), a presumed universal language.
    U( mother proper) has its own vocabulary which is composed of from 240 root words in my design.
  4. From U( mother proper), we get U( English). The b-word (English) is replaced with the b-word U (mother proper).
Thus, If the postulate I is correct, English speaking people should be able to learn U( English) very easily, and the U( English) should meet the criterion 1 as the only difference between U(English) and mother bag English is the substitution of b-word (English) with b-word ( U(mother proper)).

With the same BMFB procedure, we can construct U (Russian), U (German), U (Arabic), U (Chinese), etc.. Then, is it now reasonable to propose another postulate?

Of course, if someone can demonstrate that the postulate 2 is wrong, then we will modify it. With postulate 2, a true u-language can be constructed as follow: That is, this u-language is not just the U (Mother Proper) itself but encompasses all its dialects U (natural languages). As the U (a natural language) is a dialect of this Universal Language and is a dialect of its mother bag by definition, then that natural language should be a dialect of this Universal Language (u-language).

b: Meeting the Design Criteria

Is this newly designed universal language meeting the design criteria (C1 and C2)? As the U (Mother Proper) and the U (English) is now published, the above question becomes a testable issue. However, I would like to answer it theoretically.

For U (English), it should meet the C1 (with the scope and the capability in par with, at least, one natural language), as the only difference between it and the natural English is that the b-words (English) are replaced with b-words (u (mother proper)). However awkward this substitution could be, it will not alter the scope and the capability of the U (English). Yet, can U (mother proper) itself meet the C1 requirement?

Can U (English) meet the C2 design requirement? It is, in fact, the same question of how easy that the vocabulary of b-word (mother proper) could be learned. Can the vocabulary of b-words (mother proper) be learned with a 300 hour study?

The central question now becomes that "Can U (mother proper) itself meet both C1 and C2?" As the U (mother proper) is a constructed language, we do know its components exactly, and it consists of the followings:
  1. list of symbols:
    • conceptual words only -- b-words (mother proper) composed of from only 240 root words, no i-words nor any kind of inflection.
    • punctuation marks -- the same as English
  2. Formation rules:
    • with two types of word-phrasing
      • with hyphen -- having word order
      • with parenthesis -- having no word order
    • all other English grammar are excluded
  3. rules of inference -- the same as English
  4. fictitious machine -- the same as English

Can such a language have the same scope as the natural English? To answer this question completely, we must describe language on the metaphysical and ontological level, and it is a big job. I will present it in another article. At here, I will discuss it intuitively.

First, we are able to find one to one correspondence between all English vocabulary and the vocabulary of U (mother proper) with the following equation:
English (i-words, b-words) <====> U-mother proper (b-words)

Second, the design of all English grammar is for assuring that a word string (containing a string of words) to be read without any ambiguity by a linguistic community. It is mathematically provable that the word-phrasing method can also assure the uniqueness of any given word string.

With these two points being answered, it is fair to say that U (mother proper) does have the same scope as the natural English. Yet, can this U (mother proper) be learned by an average person in the world with a 300 hour of study?

How difficult a language is for its native people is depended upon its vocabulary. In the early 20th century, the Chinese written words were viewed as the most difficult language to learn in the world, and most of Chinese people (85% of them) stayed as illiterate because of its difficulty. The slogan at the time was, "Without abandoning the Chinese written word system, China as a nation will vanish for sure." The result was the introduction of simplified Chinese written word system.

In fact, the vocabulary of all natural languages are difficult to learn even by its native people. Only very small portion of the vocabulary of natural languages is based on some kinds of root word system. The majority of them arose as a token of "you told me so." There is no chance of any kind to decode the four letter "book" to be a bound paper with printing on them. Then, trying to memorize thousands or hundreds of thousands of those "you told me so" tokens is, indeed, a youth killing chore. Also, for this reason that a word token is having no innate meaning of its own, some theories of "meaning" on words arose. There are, at least, three such theories.
  1. Referential theory -- every word (a linguistic token) always has one non-linguistic object in the real world as its reference, such as the word token "s-t-a-r" corresponds to the star in the sky. For unicorn (a fabled creature) , there is still a picture of this animal on paper.
  2. Ideational theory -- every word token marks a representation of an idea. Communication is successful when my utterance arose in you the same idea which led, in me, to its issuance.
  3. Linguistic community theory -- a word token, the bearer of meaning, is a relatively abstract entity. Thus, the word token that one uses lose its meaning if one misuses it. A word is a common possession of a linguistic community, and it has the meaning it has by virtue of some general facts about what goes on in that community.

These three theories clearly demonstrate the difficulty of learning those word tokens (the vocabulary) in any natural language. On the contrary, every word token (the entire vocabulary) of the U (mother proper) is composed of from 240 root words. And, every word in U (mother proper) has two types of meaning.

  1. the innate meaning (the syntax meaning)-- it arises from its composing root words, and everyone who knows those 240 root words can read its innate meaning from the face of the word token.
  2. the meaning from its usage (the semantic meaning) -- this needs to be learned during the usage of the language, similar to the linguistic community theory.

Thus, the entire vocabulary of U (mother proper) can be learned by only learning those 240 root words, and it takes less than 50 study hours for learning them. The other 250 hours allowed by the C2 could be used for learning the usage of the language.

Can such an 100% root word system be constructed? What kind of root words must we have in order to encompass the scope of a natural language? What is the minimum number roots for the U (mother proper)? As the U (mother proper) and U (English) are now published with the following parts:
  1. 240 root words for the U (mother proper);
  2. 300 first generation words (b-words) for the U (mother proper) and for the U (English);
  3. 2,000 words U (mother proper)/natural English dictionary (coming soon),
everyone is able to examine it and answers the above questions him- or herself.

VI. Conclusion

Most of previously claimed universal languages, such as Esperanto, are spoken languages with a lesser emphasis on the written part. While learning a new spoken language is not easy, especially without a speaking environment as a constructed language will face, learning a new written language under such a circumstance is going to be much harder. Even for English, people who use English as their native language do not know how to spell difficult words, since they basically know English as a spoken language. On the contrary, the U (mother proper) is a silent language. All its root words are ideographs and are silent. Any b-word of U (English) will be pronounced the same as the b-word of English. In fact, the b-word of U (Arabic), identical to the b-word of U (English) in word form, will be pronounced the same as the b-word of Arabic. That is, learning the U (mother proper) and U (English) needs not putting up an effort of learning a new spoken language. This unique feature of the U (mother proper) will further assure its meeting the criterion 2.

However, the U (mother proper) is also a spoken language. I did design 300 sound modules which are the generation 1 words, that is, they are the grandfather of many descendant words. They can be used as sound roots for those descendant words. However, I did not provide any sound for those sound modules, as they can be assigned by the users. That is, the spoken part of this U (mother proper) is yet to be finished by the using community.

With the above analysis, the U (mother proper) does meet both the C1 and C2. If anyone has doubts about it, it is always testable, especially for C2.

Futhermore, this U (mother proper) can be the base of a true auto-translation machine. While the b-word of Arabic and the equivalent b-word of English are having different word forms, their corresponding b-word of U (mother proper) could be the same word. Thus, an auto-translation machine can be constructed as follow:

  1. Word of English ----> b-word of mother bag English + w-tail
  2. b-word of mother bag English ----> b-word of U (English) + w-tail
  3. b-word of U (English) = b-word of U (Arabic)
    w-tail (English) ----> w-tail (Arabic)
  4. b-word of U (Arabic) ----> b-word of mother bag Arabic
  5. b-word of mother bag Arabic + w-tail (Arabic) -----> Word of Arabic
In fact, the above process can have some parallel paths: With a successful auto-translation machine, this U (mother proper) will be a true Universal Language regardless of how many speakers that it is going to have.

The name of this U (mother proper) language is PreBabel.