Compressing has two main areas of applying: saving and transmitting of information. Applying of compression conditioned in general by economical reasons of rational use of resources. But the aim of compression is to store and to transmit not concrete symbols/pixels, but the semantic information, which they describe. Concrete presentation of this semantic information (symbols/pixels) conditioned by present technology.
Let’s regard followed analogue. Earlier money strongly associated with gold and jewelry. And travelers carried purses full of coins. That was heavy and dangerous. Then people created checks and banknotes – that is to say, jewelry stayed in bank safes and people carried only reference to this store in the form of bond. Now we store not the reference to our wealth, but the reference to money in form of information on credit card.
Thus, graphically speaking, now we are carrying with ourselves all our cash. But why couldn’t we transmit only checks? Because words “sunset were burning” contain the same quantity of information as in large tiff-file with the picture of the sunset. And if we need only information about sunset, why should we carry the whole file?
Proceeding from that, it is offered new way of presentation of information, when just the structure of information is saved separately in some databank, and information itself – as combination of notions in form of references to description of notions in that data bank.
Notion space is common for surrounding world, and for information transmitting is enough just to transmit the identificators of objects – notions in notion space, but not their description in traditional languages (another reference to raster and vector way of information storing), that will allow managing the surplus of information flow.
At present algorithms of archivating their authors select the areas of notion space and coding information in terms of notion space by intuitive way. Hence, there is correspondence between compression algorithms and mechanism of notion space, which hadn’t yet described mathematically.
Let introduce notion “solidity of information” as value, opposite to surplus of information.
Sol_inf= 1/Cs (Cs – coefficient of surplus of information by Shennon-Kholmogorov).
Solidity of information characterizes alignment of quantity of information (by the determination of information from Kholmogorov’s theorem about difficulty) and storing volume of information. According to the fact, that surplus of information couldn’t be larger, than 1, then its solidity will always be less than 1 (real quantity of information is not larger than total volume of storing information).
Sol_inf – is complex value, its components are solidities, which depend on different ways of storing the information. Thus, for text information this will be grammatical and lexical surplus of storing information. In general, it is possible to pick out Sol_code – solidity of coding (it includes grammatical and partly syntactic solidity) and Sol_sem – semantic solidity (includes lexical and partly syntactic solidity) of volume of storing information.
Sol_inf = Sol_code x Sol_sem = (1/Cs_code) x (1/Cs_sem) = 1(Cs_code x Cs_sem), where Cs_code and Cs_sem – are coefficients, which characterize the surplus, appearing at coding of information or semantic surplus.
Contemporary algorithms of compressing, as rule, decrease only surplus of coding, but they ignore the semantic surplus, because this algorithms, as rule, work with incoming flow of symbols, that decrease the efficiency of compression. Semantic surplus decrease partly when algorithms of compression with loosing of information are used. But in such algorithms given process has unguided character in order of primitive (linear) work of such algorithms (Jpeg, Mpeg).
Because this algorithms use methods of interpolation, which are simple enough, they through away “surplus” information without exact valuating of its importance. Thus, while compressing pictures using Jpeg, valuing of compressed picture quality, as rule, happens roughly. That is to say, if there is a task to make a panoramic picture and its fast transmitting, small elements of pictured won’t be transmitted properly, although in some case it could be key information (for example, for remote controlled martial systems “small point” could become enemy’s forces).
Another example: in order of language structure, in text information massifs there is a lot of doubling information, and it is enough to transmit it only ones. Or there is variable information for changing the description of which isn’t necessary to transmit whole massif once more. Contemporary compression algorithms don’t recognize such repetitions, because they apply probability approach to appearing of concrete chains of symbols. The way for solving this problem is shown in family of Ziv-Lempel’s algorithms (coding of repeating words), but this algorithms isn’t sustainable to small changes of incoming flow: if appears new, probably accidental or unimportant symbol algorithm begins to code whole block from the start.
For correcting such defects, from the CSNT point of view it is offered another approach – the first, is to get all possible information by reflecting informational massif to the notion space for searching co=ordinates of notions, described in this massif. Then is to transmit co-ordinates of space’s areas, in what information from this massif was included. So we could transmit important and through away unimportant notions. That is to say, there is totally controlled semantic surplus of transmitting information. Further coding occurs by already tried way and worked out methods of coding.
The word – is a code, which means notion, but non-optimal code. Many notions could be indicated by one word (multi-meaning words and homonyms), or one notion could be indicate by several words (synonyms, words close in meaning, for example, the road and the way). Moreover, in we’ll consider “indigenous” words of different languages, there are semantic correlations between notions they describe (running, to run, runner). But for new or borrowed words (computer) there are no such semantic correlations that increase semantic surplus of such words (see Creation of new universal optimal language). |