The word ‘information’ is very widely used but just what is ‘information’? The question is one element of the discussions around big data. Does ‘big data’ include information? When talking about information we are often reduced to emphasising that we are referring to ‘unstructured information’ but in fact we are using this expression to differentiate information from ‘structured’ data. Then we come to ‘content’, a convenient word when talking about enterprise content management. Is ‘content’ the same as information, or different? And how is it the same or different?
There is a General Definition of Information that is based around a concept of ‘data + meaning’. There is a good summary account of GDI in the Stanford Encyclopedia of Philosophy written by Luciano Floridi, the author of the excellent book Information – A Very Short Introduction. When you start reading this book you quickly find that information is a very much wider concept than something notionally related to a document.
A couple of weeks ago there was a fascinating television programme on BBC4 on information by Professor Jim Al-Kahili which traversed the period from the initial development of writing scripts to Alan Turing’s seminal work on the basic principles of computation via Claude Shannon’s 1948 paper on A Mathematical Theory of Communication. One of the concepts introduced by Shannon was information entropy as a measure of the uncertainty in a message. At the Enterprise Search Meetup which took place during the Enterprise Search Summit Fall in Washington in October it was interesting to hear some quite intense discussions around the implications of entropy in enterprise search.
I’ve just finished reading Understanding Information and Computation by Philip Tetlow. The chapter titles are somewhat daunting, including ‘Dot-to-Dots Point the Way’ and ‘Why Are Conic Sections Important’. Summarising the book is very difficult but in essence the author sets out to show that information is one of the fundamental forces of the physical word, somewhat like gravity and electromagnetism. This may seem far-fetched but the case made by Tetlow is a powerful one. Incidentally if you are interested in how search applications work Chapter 15 ‘Time to Reformulate with a Little Help from Information Retrieval Research’ is a good description of the widely used vector space relevance algorithms.
In 1968 Marshall McLuhan remarked in ‘War and Peace in the Global Village’ that the one thing that fish are totally unaware about is water, as they have no anti-environment that would enable them to perceive the element they swim in. You don’t need to know anything about water to drink it or wash in it but have a look at the Wikipedia entry on the properties of water and you will begin to appreciate the value of some of the properties to our existence on the planet. After all about 50-60% of our body weight is water! Information is just the same. We can continue to use the word to mean whatever we want it to mean but before too long we perhaps should be thinking in more detail and more rigour about the meaning, purpose and value of information.