Engineer's Book

Engineers – Telecoms – Physics – Teaching – Innovations

Artificial Intelligence 02 – French or English Language ?

Written By: Jean-Paul Cipria - Juin• 20•17

Shakespeare - Sonnet 18 - 1609

William Shakespeare – Sonnet 18 – 1609


Reconnaissance d'un pulse UWB de Gauss 2 par reconnaissance de pattern

Ultra Wide Band (UWB) Recognition by Correlation Pattern.

How do we compare simply two texts in English and French ?
It is the fundamentals for Artificial Intelligence.
First : « is Text in English or in French ? » « Do Calculators able to know it ? »

.

Created :2017-06-20 12:24:49. – Modified : 2017-07-31 18:28:54.


Easy Level


Done ... Un Café ?

Done … Un Café ?

Old References Texts for Language

Is English Habits for text an Artificial Intelligence ?

In a first time we try to determinate what is the language country the text come from ? It is the low, but not so difficult, level to understand artificial intelligence. That is not so artificial ! We said this because we try to do it by a computer. In this article we don’t work with intelligence as you can do with our brain. We use mathematics results from statistical discovers from 1950 to 1990 in this article. Then are those theories in use today ? Yes there are. Fondamental concepts begin here and some other theories were added from 1990 to today. Since 1990 we calculate as our brain « seems » to do.

Then let us try to observe two styles of text. One in French from Molière – « Les fourberies de Scapin » . Those texts were forbidden or made to be laught people against gouvernment. And observe W. Shakespeare from « Sonet 18 ». We have’nt so explanation in English history. Shame ? But we heard that this text was also a little forbidden or the object of his love was not the right nature for that time (1609). It is also an aim to transmission to send and receive the correct meaning. What is the type meaning ? Some informations with sometime double sens are embedded in text.

  • • »Shall I compare thee to a summer’s day? »
  • « Puis-je te comparer à une journée d’été ? »

And this one :

  • « Rough winds do shake the darling buds of May, »
  • « Les vents violents secouent les beaux bourgeons de Mai. »

Then this sentence, even traducted in French, we laugh a lot. Thank you Shakespeare ! Laugh, smile, enjoy are double meaning consequence  in « natural » intelligence.

Are other sciences methods Artificial Intelligence ?

Do we display another example ?  – « How to recognize a pulse picture surrounded by noise ? » [CIPRIA-14/03/2012] .

Reconnaissance d'un pulse UWB de Gauss 2 par reconnaissance de pattern

Ultra Wide Band (UWB) Recognition by Correlation Pattern.

In this exemple methods, technologies, seem to be different. We use some « pattern » in 2D or 3D and, by a mathematical correlation, we determine a … probability to be recognize. Then model is different BUT … theory is the same. See research questions : « What is an information ? » and « What is an entropy ? »

But … begin simple : Once a time Molière and Shakespeare …

French : Molière – Les Fourberies de Scapin

English : Shakespeare – Shall I compare thee

Presence Probabilities

First let’s read « Artificial Intelligence 01 – What did C. Shannon Discovered ?«  [CIPRIA-04/10/2012].

For each character we count occurrences. We also include spaces. Then we calculate p(i) for the character ‘i’ as :

  • p(i)=\frac{NumberOfCharacter(i)}{TotalNumberOfAllCharacters}

After this we compare simply probability for ‘e’ divided by probability for ‘a’. Then we generaly analyse data and we ask us « What is the correct variable to discriminate two signals ? »  We remark in this simple case that we can choose :

  1. \frac{p(e)}{p(a)}=2.20 for Molière Sentences.
  2. \frac{p(e)}{p(a)}=1.75 for Shakespeare Sentences.
  3. \frac{p(e)}{p(a)}=2.00 for J. P. Cipria French Sentences.

Molière French Dispersion

Molière Rapport_e_a = 2.2176 ==> French Text with a 0.1 threshold..

Shakespeare English Dispersion

Shakespeare Rapport_e_a = 1.7009 ==> English Text with a 0.1 threshold.

Conclusion

First Order Structures

On first order structures we consider only the presence probability of each characters and their ratio. We have no idea of words, words lenght or even a simple meaning. We just consider :

  • p(i) for the character ‘i’.

But we get ‘space’ or ‘ ‘ probabilities because the probability for the space character contains an information of number of words and by a mean order we can calculate mean lenght for words.

For punctuation we can get also ‘,’ and ‘.’ then we can discriminate, for exemple, the writing habit for Jean-Paul Cipria. Then also character ‘e’ to character ‘a’ ratio changes in modern style. For author this ratio is 2 and for Molière the ratio is 2.2. Then J.P. Cipria is French, sure, but it is not Molière and even less Shakespeare ! 😉 Computer said it !

Shakespeare Unigram Density (first order structure dispersion)

Probability to : 0.9036

We organise letters to be selected by p(i) and we can see we can’t recognize an English word. The information is missing in this first order statistics.

Digram Structure or Second Order Approximation

We can try also :

  • p_i(j) as probability that character j occurs after character i presence for the second order probability as Claude Shannon Page 7 to 13 [C. Shannon-1948]. We can construct then Markov Diagram.

French digram second order

English digram second order

Trigram Structure or third Order Approximation

French

‘qui’ or ‘qu’ preceded by a space are good discriminators for French. Such informations are like « traces » with small occurrences but presences are signatures.

English

‘sh’ or ‘th’ or ‘to’ preceded by ‘ ‘ are good discriminators for English. Such informations are like « traces » with small occurrences but presences are signatures.

 

References

Science

Technology

  • [CIPRIA-2017] Jean-Paul Cipria – « Matlab File » – 2017
    – Ascii_File_2017_06_19.m » : Find probabilities in text files.
    – « comp_fr_en_2017_06_20.m » : Calculate ratios and find informations from Probabilities.

.

Jean-Paul Cipria
20/06/2017

You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Anti-Spam Quiz: