Open Source Is The New Normal In Data and Analytics
You hear a lot these days about how the growing deluge of digital data is changing the nature of how nearly every business operates fundamentally.
I’ve argued for a while now that we’re at or near a data tipping point[1] beyond which lies a new world where companies analyze many fundamentally new types of data in real-time and use it to make business decisions that were previously impossible.
But after all tipping points, there are winners and losers. I believe in this case, the winners will share one really important quality: a deliberate choice in favor of leveraging open source technologies at the heart of their modern data architecture.
With Hadoop (the software platform for distributed processing of large sets of data) at the core, open source data architectures have reached a new level of maturity that some companies may not yet fully appreciate. As several mega-trends have converged — cloud computing, artificial intelligence, the internet of things and streaming — open source technologies have risen to the occasion with projects including Apache NiFi[2], Apache Storm[3], and Apache Kafka[4] to drive innovation.
Open source data architectures are no longer analogous to research projects forever running in test environments for trials and experimentation. They’re now considered mainstream in the IT environments and are widely deployed in live production in several industries. In fact, it’s become so common that if you’re building a modern data architecture, chances are high you’re using an open source stack. For a historical comparison, Hadoop in 2017 is roughly where Linux was in 2005: breaking out from the technical curiosity into a mainstream technology used everywhere and driving business outcomes.
Developers who’ve come of age in the GitHub era look at open source architectures as their preferred choice not only because they cost less to deploy and operate, but because they can drive meaningful value out of the core collaboration model. They’re comfortable with it not only because they can examine and tinker with the underlying source code to fully understand how it works, but also enhance it for specific needs, and contribute those enhancements back to the community at large.
There are a lot of great examples. Ford Motor Company (a client of ours) is using open source[5] at the heart of its Smart Mobility initiative, gathering all kinds of data — averaging about 25 gigabytes of data per hour per car — to help improve the experience of driving and riding in its Ford Fusion hybrids.
Macy’s, another company we work with, is using open source data technology to get a better understanding of its customers in order to communicate with them more effectively, and crafting advertising campaigns that reach the right shoppers at the right moment. And our client Progressive Insurance used Hadoop to analyze more than 15 billion miles[6] worth of driving data gathered from its Snapshot devices plugged into the data ports of millions of cars. Drivers who tend to show safer driving habits get discounts on their insurance policies.
References
- ^ data tipping point (hortonworks.com)
- ^ NiFi (nifi.apache.org)
- ^ Storm (storm.apache.org)
- ^ Kafka (kafka.apache.org)
- ^ using open source (doughenschen.com)
- ^ analyze more than 15 billion miles (doughenschen.com)
Comments