Machine Learning on Big Data

Rajat M
7 min readMay 19, 2021

--

Reference — “Big Data Analytics with Java” by Packt Publishers

Even as you read this content there is a revolution happening behind the scene in the field of machine learning and big data. From every coffee that you pick up from a coffee store to everything you click or purchase online, almost every transaction, click or choice of yours is getting tracked and analyzed. From this analysis, a lot of deductions are now being made to offer you new stuff and better choices according to your likes. These techniques and associated technologies are picking up so fast that as developers we all should be part of this new wave in the field of software. This would allow us better prospects in our career as well as enhance our skill set to directly impact the business we work for.

Earlier technologies like machine learning and artificial intelligence used to sit in the labs of many PhDs. But not anymore, with the rise of big data these technologies have gone main stream now. So, using these technologies you can now predict which advertisement the user is going to click on next, or which product they would like to buy or it can also show if the image of a tumor is cancerous or not. The opportunity to work in this technology is very vast.

Let’s see some popular use cases where we use machine learning and regular analytics on big data on a day to day basis. Along the way, I will also mention how they are explained in the book “Big Data Analytics with Java” by Packt publishers.

Recommendation Engines

I loved watching Marco Polo on Netflix and Netflix knows that and recommends me similar movies and ‘shows’ that I might like (refer to image above).This is one of the most common and popular use cases of machine learning where the machine learns from our historical data and starts making recommendations to us.

Recommendation engines like these have existed for quite some time now and in companies like Amazon, Youtube and Facebook they directly drive business. Using a recommendation engine is one thing but how do you build one yourself. “Big Data Analytics with Java” covers recommendation engines in detail and also has a case study of giving movie recommendations to users based on actual movies dataset from IMDB.

Frequently Bought Together

Let’s look at the image as shown above.As you might recall whenever you buy any item on any ecommerce store and go to that items detail page you might also see other products that are frequently sold along with it. This gives more choices to the user to purchase along with the current item and is done to boost the sales.

There are two popular algorithms called as ‘Apriori Algorithm’ and ‘FPGrowth Algorithm’ for implementing this. Using these algorithms on top of historical transactional data you can figure out which items are sold frequently together. The book “Big Data Analytics with Java” covers these algorithms and also builds a sample case study using these algorithms to figure out frequently bought together items for an actual ecommerce store and its products. The dataset is obtained from UCI Machine learning dataset repository for an online retail store from the UK.

Predicting Numerical Values

Refer to the image as shown above. Machine learning has plenty of use in predicting the future value of items or entities as long as historical data is available to train the models on. The value can be anything whether it’s the amount needed for a marketing campaign or the amount of expenditure needed to launch a new product or predicting the price of a product. The book “big data analytics on java” uses a real life case study of predicting the price of a house based on a dataset of different variables released by King county in Chicago.

Loan Approvals, Insurance Quotes and More

Automatic loan approval is a very interesting problem to solve. Let’s look at the image above. Fintech companies are disrupting traditional banking at a very fast pace. They have products and solutions like instant loan approval. But how can you approve a loan of a person at that point in time. Here enters another use case of machine learning. By training models with past historical data, we can build robust models pre-trained with attributes like credit history, the tenure of work, term of work, age, salary etc. the model can make good predictions regarding the loan approval and can cut down the loan approval process time by a big margin. The book “big data analytics with java” covers this use case using a sample case study build on top of Lending Tree dataset. The lending tree is an online lending firm that gives away its data for free for studying purposes.

Spam Detection and Sentiment Analysis

Spam detection is a popular use case, Gmail does it for us and we are so used to using it. Let’s look at the image of two emails as shown. The first one is a spam and the one on the right-hand side is good. Now let’s look at another image as shown

Using the same algorithm which is used for spam detection, “Big Data Analytics with Java” builds on a sample case study of showing the sentiment (whether positive or negative) of a user on top of a set of tweets for different movies.

Predicting Diseases

Disease detection is one of the very interesting problems in research. Machine learning is specifically useful in these fields. Big Data analytics with java book covers a case study of heart disease detection using machine learning, an image depicting that is as shown above.

Social Analytics and Regular Graph Analytics

When your search for a destination on your GPS do you know that behind a scene a graph search algorithm runs to figure out the shortest path to your destination. Running these graphs on a small piece of data is one thing but running them on huge amount of data requires special software like graphframes on top of big data. Also in today’s world of social networks, we have huge social graphs of people too that connect us to people we know, or to our friends or our friends of friends. The image shown shows a very simple social graph yet it shows how complex these graphs can get.

“Big Data Analytics on Java” has an extensive chapter on graph analytics and covers a case study on a real dataset regarding airports and connecting flights obtained. Using this dataset we run analytics like running a page rank algorithm on this dataset and figuring out the top airports or figuring out shortest path between destinations in the graph and more.

Real Time Analytics

Due to the growing popularity of various IOT sensors and Wearables, there are various devices now available that can generate a tremendous amount of data at real time. This data can be collected at real-time by various products like Apache Kafka and analyzed. In the book “big data analytics with java” we show many use cases of real time data generation and analysis. We also show a case study of building trending videos pipeline at real time using videos tweeted at the real time.

Image Classification and Natural Language Processing

How would you find out the number that the user has handwritten by using a program to figure it out (refer to the image shown). And can you write it with 99% plus accuracy

Image classification and NLP are both tough and interesting problems to solve. Artificial neural networks are extremely good and getting better and better in these fields. “Big Data Analytics with Java” has a chapter dedicated to deep learning and AI. It also has a sample case study of hand written digit classification using convolution neural networks with 99% plus accuracy.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

The use cases and examples shown above are but just a few, there are plenty of other use cases of analytics now. Artificial intelligence and other analytical processes are getting so imbibed in our regular day to day processes that it is very evident that we will see the usage of these techniques to extend more and more in the near future.

Big Data Analytics with Java” is a book that is published by packt publishers and it contains practical end to end case studies on various analytical tasks involving machine learning usage. The case studies cover the usage of java code to build such systems.

The book is available on the the packt publishers website both as an ebook as well as a printed copy and can be checked here. Also a preview of the book is available here.

--

--

Rajat M
Rajat M

Written by Rajat M

Software Engineer | Engineering Manager | MicroServices | Java | Machine Learning | NoSQL | Distributed Systems and more

No responses yet