Sun Consulting Services

- Our Services -

Big Data

Why Learn Big Data?

To get an answer to Why You should learn Big Data? Let’s start with what industry leaders say about Big Data:

Gartner – Big Data is the new Oil.
IDC – Its market will be growing 7 times faster than the overall IT market.
IBM – It is not just a technology – it’s a Business Strategy for capitalizing on information resources.
IBM – Big Data is the biggest buzz word because technology makes it possible to analyze all the available data.
McKinsey – There will be a shortage of 1500000 Big Data professionals by the end of 2018.

Industries today are searching new and better ways to maintain their position and be prepared for the future. According to experts, Big Data analytics provides leaders a path to capture insights and ideas to stay ahead in the tough competition.

What is Big Data Analytics?

So, what is Big data? Different publishers have given their own definition for Big data to explain this buzzword.

According to Gartner – It is huge-volume, fast-velocity, and different variety information assets that demand innovative platform for enhanced insights and decision making.
A Revolution, authors explain it as – It is a way to solve all the unsolved problems related to data management and handling, an earlier industry was used to live with such problems. With Big data analytics, you can also unlock hidden patterns and know the 360-degree view of customers and better understand their needs.

Big Data Definition?

In other words, big data gets generated in multi terabyte quantities. It changes fast and comes in varieties of forms that are difficult to manage and process using RDBMS or other traditional technologies. Big Data solutions provide the tools, methodologies, and technologies that are used to capture, store, search & analyze the data in seconds to find relationships and insights for innovation and competitive gain that were previously unavailable.

80% of the data getting generated today is unstructured and cannot be handled by our traditional technologies. Earlier, an amount of data generated was not that high. We kept archiving the data as there was just need of historical analysis of data. But today data generation is in petabytes that it is not possible to archive the data again and again and retrieve it again when needed as data scientists need to play with data now and then for predictive analysis unlike historical as used to be done with traditional.

It is saying that- “An image is a worth of thousand words“. Hence we have also provided a video tutorial for more understand what is Big data and its need.

Big Data Use-Cases

After learning what is analytics. Let us now discuss various use cases of Big data. Below are some of the Big data use cases from different domains:

Netflix Uses Big Data to Improve Customer Experience
Promotion and campaign analysis by Sears Holding
Sentiment analysis
Customer Churn analysis
Predictive analysis
Real-time ad matching and serving

Big Data Technologies

There are lots of technologies to solve the problem of Big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, etc. Let’s take an overview of these technologies in one by one-

Apache Hadoop

Big data is creating a Big impact on industries today. Therefore the world’s 50% of the data has already been moved to Hadoop. It is predicted that by 2017, more than 75% of the world’s data will be moved to Hadoop and this technology will be the most demanding in the market as it is now.

Apache Spark

Further enhancement of this technology has led to an evolution of Apache Spark – lightning fast and general purpose computation engine for large-scale processing. It can process the data up to 100 times faster than MapReduce.

Apache Kafka

Apache Kafka is another addition to this Big data Ecosystem which is a high throughput distributed messaging system frequently used with Hadoop.

IT organizations have started considering Big data initiative for managing their data in a better manner, visualizing this data, gaining insights of this data as and when required and finding new business opportunities to accelerate their business growth. Every CIO wants to transform his company, enhance their business models and identify potential revenue sources whether he being from the telecom domain, banking domain, retail or healthcare domain etc. Such business transformation requires the right tools and hiring the right people to ensure right insights extract at right time from the available data.

Conclusion

Hence, Big Data is a big deal and a new competitive advantage to give a boost to your career and land your dream job in the industry!!!

(Day1 - Day 20)

Session 1: Introduction and history

Overview of how this field has developed, why we need data engineering, and what are the components and platforms inside it

Session 2: File Formats

Csv, parquet, avro, xml, orc, Json, gzip, snappy, SerDe, Sequence files and other custom row oriented and column-oriented formats Pros and cons of each system. Metastore. Metadata repository. Schema on read etc. Partitions.

Session 3: Ingestion

Knowledge of various source systems. Change Data Capture. Transactional systems. File servers. Sqoop, Nifi, Adapters, Data quality checks, Data Lake basics. Datasource V2 api

Session 4: Transformation

Transformations and actions. In memory data processing. Caching. Lambda expressions. Domain specific language. Sql like syntax. Data Parallel. Task parallel. ELT architecture. Shuffle and sort. Higher order functions. Aggregations. Window functions.

Session 5: Storage

Storage class concepts. Distributed file systems. Data replication. Storage abstraction. Object storage. Ephemeral storage. Low latency Indexing. CAP theorem. Distributed ACID & BASE transactions. HDFS, S3, Glusterfs. NoSQL, NewSQL

Session 6: Scheduling

Task scheduling. YARN. CRON. Job dependencies. Checkpointing. Big data pipelines monitoring. Microbatch. Oozie. Airflow. Livy.

Session 7: Destination

Data visualization. KPI reporting. Real time dashboards. Content Delivery Networks. Advanced Data Lake concepts. Lambda architecture. Presto. Druid. Superset

Session 8: Streaming

Real time data challenges. Kafka, Flume. Kappa architecture. Stream stream joins. Watermarking. Late data arrival. Time series data.

Data Science Course:

Session 1: Introduction and history
Session 2: Programming
Session 3: Statistics
Session 4: Exploratory data analysis
Session 5: Models and A/B testing

Session 6: Model validation and overfitting
Session 7: Big data analytics
Session 8: Supervised and unsupervised ML
Session 9: NLP and Deep learning basics