Big data: What can an energy company teach us about data science?

SmartMeter big dataTwo weeks ago I had the opportunity to attend the Economist’s first Information Conference on big data and the disruption it is about to unleash on the world. The event was the brain child of Kenn Cukier who almost 18 months ago wrote this Economist article on the ‘big data deluge’.  The event brought together some of the best minds on this topic, with speakers ranging from academia, technology, NGOs and consultants.

The one topic that received significant coverage during the event was the emerging role of the ‘data scientist’ – a term apparently coined by Jeff Hammerbacher, the co-founder and Chief Scientist of Cloudera while he was at Facebook. The McKinsey Global Institute recently published a study that forecast that the shortage of these skills could be as high as 190,000 people by 2018 in the US alone. The notion of such a discipline bothered me quite a bit and until now was not able to put my finger on it.  In full disclosure, my educational background includes degrees in Mathematics, Computer Science and Operations Research and I have spent most of my career helping companies deal with data and extract insights so they can make better decisions.  But I am getting ahead of myself…

What is big data?

The definition of big data was, to my surprise, not a controversial topic. Most speakers agreed that big data is both about the quantity and quality of the underlying data, i.e., volume measured in petabytes (1015 bytes or 1M gigabytes or more), and data that does not only include structured but also unstructured (i.e., text, video, social media, etc.) data as well.   You can read Wikipedia’s definition here.

Incredible innovation at the data management layer

The field of big data has seen an explosion of a new alphabet soup over the past few years (ACID, Cassandra, Hadoop, HBase, Hive,  NoSQL, MapReduce, Pig, and many more).  Many early-stage (Cloudera, Kognitio, Netezza,  ParAccel) and established (EMC through its acquisition of Greenplum, Microsoft through its acquisition of DATAllegro, Oracle, SAP, and Teradata through its acquisition of Aster Data) technology companies are innovating at an unprecedented pace to help their customers deal with the big data deluge.

While this innovation at the data management layer is significant, most discussions around the data scientist in the industry today are focused at the predictive analytics / data visualization level of extracting insights from big data, and this is wherein my fundamental disagreement lies:

big data

Field is not new – Extracting insights from data (i.e., predictive analytics) gave birth to Operations Research as an inter-disciplinary field during World War II.  The field has its roots in the 1840s based on the work Charles Babbage did to optimize the UK’s mail system.  During WW II, UK and US scientists across many of the same disciplines people talk about today (mathematics, statistics, sociology and psychology) were brought together to help the Allied Forces optimize their artillery rounds and air / sea networks and decipher the German cryptographic codes.  The field then branched out in the 1960s and 1970s in the telecom and airline industries and has since expanded across most of the business world.  The fundamental mathematical techniques however have changed very little in the past 70 years.

big data

We are all data scientists – Most of the innovation that is taking place at the data visualization layer today is about putting the information at the hands of those able to make the best decisions, i.e., the elusive business user / information worker.  While this may feel self-serving as it allows technology companies to expand their footprint, my many years of working as a ‘data scientist’ have led me to the very same conclusion:

  • The real challenge is about driving adoption: Although this is more relevant in an enterprise context, the challenge is not about squeezing the last drop of potential benefit, but rather ensuring recommendations are adopted.  If there is one thing my many years in the field have taught me is convincing the decision makers to adopt your ideas.  This Microsoft Windows 7 commercial sums it up best.
  • Back-office data geeks do not always know the business challenge: Having been one myself, I can attest to the fact despite how smart we think we are, the knowledge that comes from knowing your business while being able to also act on those insights is priceless.  The image and title of this post refers exactly to this point.  PG&E is my local energy utility company and the data on the graph is my hourly energy consumption based on the smart meter (i.e., big) data they collect from my home.  Who better to make decisions about energy consumption than the consumers themselves?  Do the people appearing in this PG&E commercial look like data scientists to you?

What do you think? Is this short-sighted ‘old-world’ thinking, or the reality that will emerge over the next few years as we move past the hype?

 

, , , , , , , ,

7 Responses to Big data: What can an energy company teach us about data science?

  1. SPK June 21, 2011 at 07:22 #

    Your thinking is in line with tech trends resulting in empowering the consumer. With your credentials and passion for this topic will be great to see you playing a more central and active role in addressing SAP’s internal big data opportunities.

  2. Natascha Thomson June 22, 2011 at 08:49 #

    Ted:

    interesting write up. I had no idea the UK mail system gave birth to Operational Research.

    I certainly agree that there is no point in collecting data for the sake of collecting data. Not only do decision makers need to see it and make decision based on the data, but in this day and age, the people who execute need to have access to data to be able to fine tune their actions on a regular basis.

    For social media, there is such a gap in knowledge, that it is more important than ever that the people in the trenches can see the data and figure out what it really means and what data is meaningful. Constant evolution in this space does not always make it easier.

    Listening on the Internet is currently an imperfect science, to say the least, and reports have to be scrutinized and questioned, or the resulting decisions based on them can be bad. For social media, meaningful data automation is miles away…

    Cheers,

    Natascha

    • Ted Sapountzis June 22, 2011 at 23:33 #

      Natascha,

      Thanks for your comments, analysis of social data is indeed a great use case since we understand so very little. Who better to decide what analyses to conduct and what the insights are than the people that actually care about their data? There is indeed so much noise in this data right now that no ‘smart’ data scientist (or even worse, algorithm) can come up with any meaningful recommendations….

      Ted

  3. david k waltz September 7, 2011 at 14:31 #

    I am glad to hear you say a lot of this is not new. Seems the part of this that is new is that there will be a lot more work that needs to be done because of the amount of information available, and new tools that will need to be learned to handle it. But it is all still building off of fundamentals that have been around for centuries.

  4. Richard Rogers October 21, 2011 at 09:20 #

    Great article and questions raised Ted. With the explosion of devices coming online (e.g. from smartphones to smart-meters) this topic will explode. E.g. in 10 years the data will be out there in the US of where every person is and every light that is turned on. How is that for a data boom ; )

Trackbacks/Pingbacks

  1. You did what? | Lead, Don't Manage - November 5, 2012

    […] The 2nd chapter began after the Business Objects acquisition, where I got the opportunity to join the new ‘startup’ within SAP and helped drive the growth of our SaaS analytics business.  This chapter was a bit of a homecoming to me, as I had spent the first 10+ years of my career in this area – today I would be called a data scientist. […]

Leave a Reply

%d bloggers like this: