ecognizing speech and manuscripts on natural languages appear the problems, which are close to problems of translation.
That is to say, there are words, close in sound or number of phonemes. For example, “ship” and “sheep”. Human cope with detecting such words easy enough just understanding the context in what words were told. But it is almost impossible for computer systems to detect such close number of sounds. Hence, for speech detecting it is necessary not only to hear it, but also to understand what is talking about. Human equivocally perceive the context and restore (“overthink”) the phonemes, he hadn’t heard, while existing algorithms just don’t take into consideration this context.
Analogical problems appear in recognizing manuscripts. Variations in spelling of symbols, made by different people, don’t give an opportunity to construct equivocal correlation between hand-written letter and letter of alphabet. So, it is also impossible to construct exactly the word using its hand-written analog.
Existing of this problem conditioned by mechanistic approach to phonemes and symbols recognizing. There is an attempt to recognize them as discrete elements without consideration of their semantic correlation.
The problem here could be solved by reflecting the notions into semantic space for those hypothesis, which is more probable for spelling and sounding of recognizing words. All possible variants of words could be received from recognized information, are considered from the start. The reflection is carrying out with previous context, that is why there is possibility to immediate choice of notion, semantically closest to given context, and to restoring the words to be recognized.
One of the problems of present recognizing systems is restoring, using recognized phonemes of start word. It is connected with the fact, that transcription (pronunciation) of each word coincide with its spelling without fail. For example, “debt”, “whether”, etc.
Let’s consider of speech recognizing system’s architecture, which is based on the CSNT. Approach, based on the CSNT, gives easy solving of the problem. It is more reasonable not to find the ways spelling some set of phonemes, but directly search this set of phonemes in notion space. As words are searched for determination of their co-ordinates. The co-ordinates of the notion is determining by phonemes set (it isn’t necessary to store in notion space only written words). And the solving of notion in written, text or graphical appearance is working out in combination with determining of allowed/restricted area.
The advantage of such approach is that it allows realizing client-server architecture of application processing. That is to say, recognition of phonemes is carrying out on “slim” enough client, but search of corresponding of phonemes set and notion is carrying out on server – remove computer. Moreover, if the phonemes set isn’t recognized on the closest server, it will be able to send through telecommunicate nets to other servers (for example, if the application is made on the language, not used earlier). That is to say, it is possible to construct the system recognizing speech on several (potentially – anyone) languages.
Disadvantages of traditional approach to speech recognizing, when there is an attempt to find told word by set of sound on start, are obvious now. These disadvantages are: impossibility of exact, 100% understanding of entered commands by machine intellect, insufficiency of dictionary by limited number of commands, strict algorithm of working of such interface, great dependence on announcer’s pronunciation.
|