Companies are being destroyed and created around big data, says Luke Lonergan, co-founder and chief technology officer at Greenplum, a big data analytics company that was acquired by EMC. Admitting that big data is a squishy term, and that is partly by design, he nonetheless believes companies can’t afford to ignore it.
In a look at large cap companies, Deloitte concluded that they used to live an average of 25 years; now it is more like 5.
“Clearly something is different,” said Lonergan, speaking at the FUSION 2012 CEO-CIO Symposium in Madison.
“You have a customer coming in who knows more about your business than you do. If you have data coming in from consumers and can match the rate and activity, then you survive.”
The drive toward big data is being led by the lines of business, not IT, he said.
“It is imperative from the business standpoint that you need to get ahead of this new wave of interacting with customers. You need to know who that customer is, what they represent to the business now, what they should represent to the business and how to move them along the trajectory to be that great customer they should be.”
The challenge for IT is to make sure it doesn’t stand in the way. Greenplum has develop parallel processing that can cut analysis from eight hours to 10 minutes, but elements within the architecture have to change to accomplish this with Greenplum or SAS, its analytics partner, he added.
He said that SAS found a way to parallelize their algorithms that has cut some fraud detection analytics from 27 hours to 52 seconds.
Annika Jiminez, senior director for analytics solutions at Greenplum, said big data is happening in nearly every sector of business and government, from health care where it is used in medical records and treatment pathways to car manufacturers using it to capture data on how vehicles are used and transmitting it to a data center.
“We see big data analytics as very different from the old paradigm that was driven by requirements to support and operationalize capabilities around business intelligence (BI), not to support a very agile environment which in essence is very development-like. Big data is experimental and ad hoc vs. repetitive in BI, and it mostly mostly semi-structured while BI is structured.”
Greenplum is a key player in big data analytics, although it hardly has the business to itself — it competes with Teradata, Netezza which was bought by IBM, and Vertica, which was acquired by HP, and several others.
Lonergan wasn’t exactly looking to start a big data analytics firm in 2002. At the time he was focused on helping companies with online transaction processing (OLTP).
“We had built technology focused on transactions, but then we began hearing from people who said they had reporting issues that were killing them. They had to find a way to lever new data sources.”
Starting in 2006, Greenplum began to focus on analytics because that’s what companies were concerned with.
“They wanted to get insights out of data that looked to be opaque,” said Lonergan. “Our analytics techniques became very powerful and that set our direction.” It placed Greenplum at the front of change in the way business is done.
“When we talk to customers, that is where we spend our time — working with people who are facing profound change in the way they engage with customers.”
A second major challenge for Greenplum customers is changing their organizations to accommodate new patterns and new roles.
“The data scientist shortage is a temporary thing,” said Lonergan. “It’s no longer new and now everyone thinks they need them. But there are many more people involved in this kind of work than data scientists. Organizations that lever real-time data streams in the way they engage with customers will have to change the way they operate.” He thinks a whole crew of people will have to work around a layer of analytics that responds to real-time data to make decisions on ads and offers as they watch the results flowing from online actions and transactions. That will require a change in an organization’s infrastructure.
Business clients want analytics that will tell them what is happening with their customers, their competitors and the advertising landscape. They have to work across different channels of lead generation and analyze data that often comes in from new and different sources such as Twitter, Facebook and new ad mechanisms.
“They want to get into those data sources for better leverage around customers. They are familiar with the ideas of what needs to be done, but not necessarily on how to do it. They may have done some experiments which indicate how powerful it would be for their revenue stream if they could get these data sources into their normal business flows.”
He described people in IT as concerned about how to leverage their existing investments in technology to meet business demands.
“When we talk to enlightened IT organizations, they realize they have to make an investment in big data and storage, and storage is probably the number one expense because it grows as the line of business wants new sources of data. They want to know what storage requirements will be in the future.”
Greenplum has answers for them; in 2010 is was acquired by EMC, the storage giant.
Usually the business users have a good idea of what data they are interested in mining when they meet with a Greenplum data scientist. Often they have a potential fountain of information and need to find a way to analyze it. Sometimes they have valuable log data but don’t use it, and Greenplum can show them how.
And sometimes Greenplum data scientists can show them how not to approach big data. A European customer brought in Hadoop and hired a consultant to write the Java code to connect to it. After 18 months they called Greenplum.
“They had a mountain of code. They didn’t understand it because it was too voluminous, and we couldn’t understand it either,” said Lonergan. “We see that with some customers; they get very excited about Hadoop and over-commit to a strategy without really understanding how to do it, and how to stay out of the Hadoop trap. People are starting to learn about big data, but putting it into a product is where they can get trapped.”
Lonergan likes to compare Hadoop and some of the tools to use it to a box of knives left in kindergarten classroom. “The tools are powerful but when used improperly they can become a bloody mess.” Many of them were developed quickly to solve issues Internet companies faced, and the edges haven’t been rounded off.”
Companies which want to do what Internet companies have done will take the tools from Hadoop and find themselves drowning in source code because Hadoop requires writing a large number of tools from scratch, he added.
“For companies which are used to infrastructure that doesn’t require writing a lot of source code, that can lead to some very ugly results.”
The opinions expressed herein or statements made in the above column are solely those of the author, and do not necessarily reflect the views of Wisconsin Technology Network, LLC. WTN accepts no legal liability or responsibility for any claims made or opinions expressed herein.