Already, “big data” has become one of those buzzphrases you say with an apologetic smirk. It sounds like marketecture, broad enough to apply to almost anything.
So let’s clear up what big data is and isn’t. Perhaps you’ve heard the canonical “three V’s” definition: data high in volume, velocity, and variability. In other words, big data comes in multiterabyte quantities, accrues or changes fast, often resists normalized structure — and tends to demand technologies beyond the tried-and-true RDBMS or data warehouse.
That cluster of new technologies around big data — including Hadoop, a wild array of new NoSQL databases, massively parallel processing (MPP) analytic databases, and more — together represent the biggest leap forward in data management and analytics since the 1980s. That’s really what big data is about. And these emerging technologies are already delivering business value: in deep insights about customer behavior, in faster app dev cycles, in the ability to use commodity hardware, and in reduced software licensing costs, because almost all these new technologies are open source.
Assuming your data volumes are exploding as fast as everyone else’s, you’re part of the big data trend whether you like it or not. So why not employ the tools purpose-built for the big data era? It’s a better strategy than blindly buying more Oracle licenses or building another gold-plated data warehouse. Where you start, though, depends on the problems you want to solve.