Article

Edward Huskin
Edward Huskin 28 August 2019

The Evolution of Big Data: Pushing Hadoop and Spark to The Edge

When ‘Big Data’ was first being coined as a term making rounds in the technology industry, Yahoo was still the leading search engine, and Hadoop was clinging onto that popularity. Now, Hadoop seems to be clinging onto what may be its final moments before its eventual demise.

When ‘Big Data’ was first being coined as a term making rounds in the technology industry, Yahoo was still the leading search engine, and Hadoop was clinging onto that popularity. Now, Hadoop seems to be clinging onto what may be its final moments before its eventual demise.

Few other industries move on as fast as the tech industry does, and expert opinion has it that Spark would be the technology to replace the aging Hadoop. This was true to a large extent, but even the reign of its superior younger brother hasn’t lasted as long as they would have had us believe.

The adoption of Hadoop and related traditional Big Data technologies has been slowing in recent years, and, if the industry experts are to be believed, that adoption is moving towards ‘the edge.’

Moving Away from the Cloud

Hadoop owes a large part of its popularity to the fact that companies like Cloudera and Hortonworks were able to commoditize it on the cloud.

This took away the hassle of having to put up and manage different clusters, take care of security and maintaining them. As the amount of data companies found themselves dealing with new issues started to emerge: bandwidth costs, privacy concerns and security risks.

To mitigate all these risks, the obvious choice is to eliminate the need for a cloud altogether and move closer to where the data is created. For some industries, there will hardly be any noticeable change – cloud-based companies will create and analyze their data in the cloud, for example, and data created locally is also analyzed locally.

This is currently referred to as ‘edge computing,’ and is believed by many to be the next phase of the Big Data movement.

If our favorite industry experts aren’t to be believed in their predictions, we can at least trust industry leaders (read: Google, Amazon and Apple) to know what they are doing. This has been largely driven by the increasing need for machine learning systems with lower latencies.

Apple famously marketed its A11 Bionic, launched with the iPhone X as having a ‘neural engine,’ Huawei’s high-end phones come with a ‘neural processing unit,’ while Google and Amazon’s prefer the simpler ‘AI chip.’

What About Hadoop and Spark?

Old technology is notoriously difficult to kill. Windows XP is still used in the majority of ATMs around the world, long after Microsoft stopped support for security patches. And, speaking of ATMs, a larger percentage of them also still run COBOL code – a language whose use is restricted to esoteric computer science use cases and the scientific community. Hadoop and Spark aren’t going anywhere any time soon.

Hadoop is likely to find itself as a legacy technology that requires little more than maintenance in the far corners of the world’s data centers. Despite the popularity it currently enjoys, Spark may not have very long to live, either.

As far as data science applications go, few ever involve only one step along the process. The same way Hadoop lost favor for trying to do too much at once, new and more specialized tools are likely to occupy the same space Spark once enjoyed… if edge processing doesn’t take over first.

Big Data Isn’t Always a Good Thing

In 2015, online analytics firm Gartner removed ‘Big Data’ from its annual list of hyped-up technologies. More industry experts were quick to point out that this must mean Big Data has reached the end of its useful life, but the next few years proved it wasn’t so. Big Data had stopped being new and novel and is now the norm.

Big Data has its definition rooted in the three Vs – volume, velocity and variability. A crucial aspect that quickly creeps up on the unsuspecting data scientist is a casual ‘U’ as in ‘usability.’

Consider machine learning, where a program has to be fed continuous streams of data in order to learn from. The catch is, of course, that the data has to undergo a fair amount of cleaning, filtering and processing before it’s fed into a model. Throwing in a bunch of gunk will result in even more gunk – garbage in, garbage out.

Which brings to light an even more pressing concern: what happens to unusable data? Do you just dump it out? Sure, storage is cheap, but some data is timeless – it might be needed for future use.

Then again, should you keep indefinitely storing more and more data, even once it’s been used? With Hadoop and Spark out of the picture, the answer seems to lie in processing data at the edge more than anything.

The New Gold Rush

‘Machine learning,’ ‘AI,’ ‘deep learning’ and ‘internet of things’ are still by and large nothing more than buzzwords. However, they have started to grow into the socks of expectations that were initially thought up for them. With this growth, they are finding more real-world applications, and further, a new gold rush has emerged – the need for lower latencies.

Perhaps the biggest advantage of processing data at the edge and saying goodbye to data centers is lower latencies. The epitome of our new gold rush is self-driving cars, where life-or-death decisions may have to be made in split seconds. Imagine having to send a request to the server every time a decision has to be made.

But the requirement for near real-time processing such as this goes beyond more crucial applications. Voice assistants typically need to compress your audio notes, send them to a server, have them uncompressed, processed, compressed once again and send back to you. All this just to figure out that it’s going to be 50 degrees later in the afternoon.

Instead, the need for data centers is being cut out almost completely by the increasing need for processing at the point of data production. Amazon is rumored to be producing its own home-grown AI chips for Alexa, for example, and Google already has their Qualcomm AI chips out on newer Pixel phones. Even Microsoft is cashing in on the action with Azure Sphere – dubbed ‘the intelligent edge.’

Please login or register to add a comment.

Contribute Now!

Loving our articles? Do you have an insightful post that you want to shout about? Well, you've come to the right place! We are always looking for fresh Doughnuts to be a part of our community.

Popular Articles

See all
How to Review a Website — A Guide for Beginners

How to Review a Website — A Guide for Beginners

A company website is crucial for any business's digital marketing strategy. To keep up with the changing trends and customer buying behaviors, it's important to review and make necessary changes regularly...

Digital Doughnut Contributor
Digital Doughnut Contributor 25 March 2024
Read more
The Impact of New Technology on Marketing

The Impact of New Technology on Marketing

Technology has impacted every part of our lives. From household chores to business disciplines and etiquette, there's a gadget or app for it. Marketing has changed dramatically over the years, but what is the...

Alex Lysak
Alex Lysak 3 April 2024
Read more
7 Reasons Why Social Media Marketing is Important For Your Business

7 Reasons Why Social Media Marketing is Important For Your Business

In the past two decades social media has become a crucial tool for marketers, enabling businesses to connect with potential customers. If your business has yet to embrace social media and you want to know why it is...

Sharron Nelson
Sharron Nelson 29 February 2024
Read more
10 Factors that Influence Customer Buying Behaviour Online

10 Factors that Influence Customer Buying Behaviour Online

Now is an era where customers take the center stags influencing business strategies across industries. No business can afford to overlook factors that could either break the customer experience or even pose a risk of...

Edward Roesch
Edward Roesch 4 June 2018
Read more
Top 10 B2B Channels to Help Your Business Grow Worldwide

Top 10 B2B Channels to Help Your Business Grow Worldwide

Explore the essential B2B channels for enhancing global business expansion, focusing on lead generation, effective branding strategies and fostering connections to unlock new market opportunities​.

Salman Sharif
Salman Sharif 21 March 2024
Read more