Big Data – a small history

The origins of the term ‘intranet’ are now lost in the mists of time, though I’m pretty sure it was first used by Marc Andreessen. Last week I had to write a position paper on Big Data vs Search for a client and fortunately for me Stephen Arnold had just published a superb column on the topic to give me a framework. As I started writing I realised I did not know who invented the term “Big Data”. Time for some research! It turns out that the first person to use the term in an academic context was Francis Diebold, an economics professor at the University of Pennsylvania. He was writing about macro-economic Dynamic Factor Models which were being driven by an explosive growth of available data and wanted a concise term that conjured a stark image. He came up with “Big Data,” which he felt seemed apt and resonant and intriguingly Orwellian, especially when capitalized.  Diebold used the term in a paper he presented in 2000 which was then published in 2003. He tells the story in a recent paper titled Personal Perspective on the Origin(s) and Development of “Big Data”: The Phenomenon, the Term, and the Discipline.

However as he notes in his fascinating paper he was not in fact the first to come up with the concept. This was John Mashey, Chief Scientist at Silicon Graphics, who in 1997 was running a seminar called “Big Data & the Next Wave of InfraStress”. This was (according to Mashey) “about the interactions of various technologies in data growth (DRAM & especially disk), stress on computing infrastructure over the next 4-5 years, and lots of bandwidths.”

The next person in the frame is Doug Laney, who wrote a technical paper for the Meta Group (later acquired by Gartner) in 2001 which helpfully is still on the Gartner blog server. Laney talks about 3D Data Management – Controlling Volume, Velocity and Variety but credits Mashey with the first use of Big Data.

Which makes Big Data around 16 years old and the 3Vs 12 years old.

The last word should go to Professor Diebold for a very thoughtful last paragraph to his paper.

“It’s not obvious, however, that a new discipline is required, or that Big Data is a new discipline. Skeptics will argue that traditional disciplines like computer science, statistics and x-metrics are perfectly capable of confronting the new phenomenon, so that Big Data as a discipline is redundant, merely drawing a box around some traditional disciplines. But it’s hard not to notice that the whole of the emerging Big Data discipline seems greater than the sum of its parts. That is, by drawing on perspectives from a variety of traditional disciplines, Big Data as a discipline is not merely taking us to bigger traditional places. Rather, it’s taking us to wildly new places, unimaginable only a short time ago, ranging from cloud computing and associated massively-parallel algorithms, to methods for controlling false-discovery rates when testing millions of hypotheses, with much in between. Indeed one could argue that, in a landscape littered with failed attempts at interdisciplinary collaboration,Big Data is emerging as a major interdisciplinary triumph.”

Martin White