• Trending Now
  • Foundational Courses
  • Data Science
  • Practice Problem
  • Machine Learning
  • System Design
  • DevOps Tutorial
  • Web Browser
  • Data Mining Tutorial
  • Understanding Data Profiling
  • Introduction to Data Mining
  • Predictive Analysis in Data Mining
  • Methods For Clustering with Constraints in Data Mining
  • Principal Components Analysis in Data Mining
  • Why to Use Surrogate Keys and Slowly Changing Dimensions in Star Schema?
  • Data Loading in Data warehouse
  • Artificial Neural Network Terminologies
  • OLAP Applications
  • Basic Attributes of an Image
  • Proximity-Based Methods in Data Mining
  • Advantages and Disadvantages of ANN in Data Mining
  • Semi Supervised Classification in Data Mining
  • STING - Statistical Information Grid in Data Mining
  • Data Mining in Science and Engineering
  • Flajolet Martin Algorithm
  • Everything You Need to Know About Data Lineage
  • What is Data Mining Trends and Research Frontiers?

Big Data Analytics Life Cycle

In this article, we will discuss the life cycle phases of Big Data Analytics. It differs from traditional data analysis, mainly due to the fact that in big data, volume, variety, and velocity form the basis of data. 

The Big Data Analytics Life cycle is divided into nine phases, named as :

  • Business Case/Problem Definition
  • Data Identification
  • Data Acquisition and filtration
  • Data Extraction
  • Data Munging(Validation and Cleaning)
  • Data Aggregation & Representation(Storage)
  • Exploratory Data Analysis
  • Data Visualization(Preparation for Modeling and Assessment)
  • Utilization of analysis results.

Let us discuss each phase :

  • Phase I Business Problem Definition –  In this stage, the team learns about the business domain, which presents the motivation and goals for carrying out the analysis. In this stage, the problem is identified, and assumptions are made that how much potential gain a company will make after carrying out the analysis. Important activities in this step include framing the business problem as an analytics challenge that can be addressed in subsequent phases. It helps the decision-makers understand the business resources that will be required to be utilized thereby determining the underlying budget required to carry out the project.  Moreover, it can be determined, whether the problem identified, is a Big Data problem or not, based on the business requirements in the business case. To qualify as a big data problem, the business case should be directly related to one(or more) of the characteristics of volume, velocity, or variety.  
  • Phase II Data Definition –  Once the business case is identified, now it’s time to find the appropriate datasets to work with. In this stage, analysis is done to see what other companies have done for a similar case.   Depending on the business case and the scope of analysis of the project being addressed, the sources of datasets can be either external or internal to the company. In the case of internal datasets, the datasets can include data collected from internal sources, such as feedback forms, from existing software, On the other hand, for external datasets, the list includes datasets from third-party providers.  
  • Phase III Data Acquisition and filtration –  Once the source of data is identified, now it is time to gather the data from such sources. This kind of data is mostly unstructured.Then it is subjected to filtration, such as removal of the corrupt data or irrelevant data, which is of no scope to the analysis objective. Here corrupt data means data that may have missing records, or the ones, which include incompatible data types.  After filtration, a copy of the filtered data is stored and compressed, as it can be of use in the future, for some other analysis.   
  • Phase IV Data Extraction –  Now the data is filtered, but there might be a possibility that some of the entries of the data might be incompatible, to rectify this issue, a separate phase is created, known as the data extraction phase. In this phase, the data, which don’t match with the underlying scope of the analysis, are extracted and transformed in such a form.  
  • Phase V Data Munging –  As mentioned in phase III, the data is collected from various sources, which results in the data being unstructured. There might be a possibility, that the data might have constraints, that are unsuitable, which can lead to false results. Hence there is a need to clean and validate the data.  It includes removing any invalid data and establishing complex validation rules. There are many ways to validate and clean the data. For example, a dataset might contain few rows, with null entries. If a similar dataset is present, then those entries are copied from that dataset, else those rows are dropped.  
  • Phase VI Data Aggregation & Representation –  The data is cleansed and validates, against certain rules set by the enterprise. But the data might be spread across multiple datasets, and it is not advisable to work with multiple datasets. Hence, the datasets are joined together. For example: If there are two datasets, namely that of a Student Academic section and Student Personal Details section, then both can be joined together via common fields, i.e. roll number.  This phase calls for intensive operation since the amount of data can be very large. Automation can be brought into consideration, so that these things are executed, without any human intervention.  
  • Phase VII Exploratory Data Analysis –  Here comes the actual step, the analysis task. Depending on the nature of the big data problem, analysis is carried out. Data analysis can be classified as Confirmatory analysis and Exploratory analysis. In confirmatory analysis, the cause of a phenomenon is analyzed before. The assumption is called the hypothesis. The data is analyzed to approve or disapprove the hypothesis. This kind of analysis provides definitive answers to some specific questions and confirms whether an assumption was true or not.In an exploratory analysis, the data is explored to obtain information, why a phenomenon occurred. This type of analysis answers “why” a phenomenon occurred. This kind of analysis doesn’t provide definitive, meanwhile, it provides discovery of patterns.  
  • Phase VIII Data Visualization –  Now we have the answer to some questions, using the information from the data in the datasets. But these answers are still in a form that can’t be presented to business users. A sort of representation is required to obtains value or some conclusion from the analysis. Hence, various tools are used to visualize the data in graphic form, which can easily be interpreted by business users.  Visualization is said to influence the interpretation of the results. Moreover, it allows the users to discover answers to questions that are yet to be formulated.  
  • Phase IX Utilization of analysis results –  The analysis is done, the results are visualized, now it’s time for the business users to make decisions to utilize the results. The results can be used for optimization, to refine the business process. It can also be used as an input for the systems to enhance performance.

The block diagram of the life cycle is given below :

big data analytics lifecycle case study

It is evident from the block diagram that Phase VII, i.e. exploratory Data analysis, is modified successively until it is performed satisfactorily. Emphasis is put on error correction. Moreover, one can move back from Phase VIII to Phase VII, if a satisfactory result is not achieved. In this manner, it is ensured that the data is analyzed properly.

Please Login to comment...

author

  • Data Mining
  • 10 Best Zoho Vault Alternatives in 2024 (Free)
  • 10 Best Twitch Alternatives in 2024 (Free)
  • Top 10 Best Cartoons of All Time
  • 10 Best AI Instagram Post Generator in 2024
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About

What is a Data Analytics Lifecycle?

Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes through several phases/ stages during its entire life. A  data analytics architecture  maps out such steps for data science professionals. It is a cyclic structure that encompasses all the  data life cycle phases , where each stage has its significance and characteristics.

The lifecycle’s circular form guides data professionals to proceed with data analytics in one direction, either forward or backward. Based on the newly received information, professionals can scrap the entire research and move back to the initial step to redo the complete analysis as per the lifecycle diagram for the data analytics life cycle.

However, while there are talks of the  data analytics lifecycle  among the experts, there is still no defined structure of the mentioned stages. You’re unlikely to find a concrete  data analytics architecture  that is uniformly followed by every data analysis expert. Such ambiguity gives rise to the probability of adding extra phases (when necessary) and removing the basic steps. There is also the possibility of working for different stages at once or skipping a phase entirely.

One of the other main reasons why the Data Analytics lifecycle or business analytics cycle was created was to address the problems of Big Data and Data Science. The 6 phases of Data Analysis is a process that focuses on the specific demands that solving Big Data problems require. The meticulous step-by-step 6 phases of Data Analysis method help in mapping out all the different processes associated with the process of data analysis. 

Learn  Data Science Courses online at upGrad

So if we are to have a discussion about Big Data analytics life cycle , then these 6 stages will likely come up to present as a basic structure. The data analytics life cycle in big data constitutes the fundamental steps in ensuring that the data is being acquired, processed, analyzed and recycles properly. upGrad follows these basic steps to determine a data professional’s overall work and the data analysis results.

Types of Data Anaytics

Descriptive analytics.

Descriptive analytics serves as a time machine for organizations, allowing them to delve into their past. This type of analytics is all about gathering and visualizing historical data, answering fundamental questions like “what happened?” and “how many?” It essentially provides a snapshot of the aftermath of decisions made at the organizational level, aiding in measuring their impact.

For instance, in a corporate setting, descriptive analytics, often dubbed as “business intelligence,” might play a pivotal role in crafting internal reports. These reports could encapsulate sales and profitability figures, breaking down the numbers based on divisions, product lines, and geographic regions.

Diagnostic Analytics

While descriptive analytics lays the groundwork by portraying what transpired, diagnostic analytics takes a step further by unraveling the mysteries behind the events. It dives into historical data points, meticulously identifying patterns and dependencies among variables that can explain a particular outcome. In essence, it answers the question of “why did it happen?”

In a practical scenario, imagine a corporate finance department using diagnostic analytics to dissect the impacts of currency exchange, local economics, and taxes on results across various geographic regions.

Predictive Analytics

Armed with the knowledge gleaned from descriptive and diagnostic analytics, predictive analytics peers into the future. It utilizes historical trends to forecast what might unfold in the days to come. A classic example involves predictive analysts using their expertise to project the business outcomes of decisions, such as increasing the price of a product by a certain percentage.

In a corporate finance context, predictive analytics could be seamlessly integrated to incorporate forecasted economic and market-demand data. This, in turn, aids in predicting sales for the upcoming month or quarter, allowing organizations to prepare strategically.

Prescriptive Analytics

Taking the analytics journey to its zenith, prescriptive analytics utilizes machine learning to offer actionable recommendations. It goes beyond predicting future outcomes; it actively guides organizations on how to achieve desired results. This could involve optimizing company operations, boosting sales, and driving increased revenue.

In the corporate finance department, prescriptive analytics could play a pivotal role in generating recommendations for relative investments. This might encompass making informed decisions about production and advertising budgets, broken down by product line and region, for the upcoming month or quarter.

Phases of Data Analytics Lifecycle

A scientific method that helps give the data analytics life cycle a structured framework is divided into six phases of data analytics architecture. The framework is simple and cyclical. This means that all these steps in the data analytics life cycle in big data will have to be followed one after the other.

It is also interesting to note that these steps can be followed both forward and backward as they are cyclical in nature. So here are the 6 phases of data analyst that are the most basic processes that need to be followed in data science projects. 

Phase 1: Data Discovery and Formation

Everything begins with a defined goal. In this phase, you’ll define your data’s purpose and how to achieve it by the time you reach the end of the  data analytics lifecycle .

Everything begins with a defined goal. In this phase, you’ll define your data’s purpose and how to achieve it by the time you reach the end of the data analytics lifecycle. The goal of this first phase is to make evaluations and assessments to come up with a basic hypothesis for resolving any problem and challenges in the business. 

The initial stage consists of mapping out the potential use and requirement of data, such as where the information is coming from, what story you want your data to convey, and how your organization benefits from the incoming data. As a data analyst, you will have to study the business industry domain, research case studies that involve similar data analytics and, most importantly, scrutinize the current business trends.

Then you also have to assess all the in-house infrastructure and resources, time and technology requirements to match with the previously gathered data. After the evaluations are done, the team then concludes this stage with hypotheses that will be tested with data later. This is the preliminary stage in the big data analytics lifecycle and a very important one. 

Basically, as a data analysis expert, you’ll need to focus on enterprise requirements related to data, rather than data itself. Additionally, your work also includes assessing the tools and systems that are necessary to read, organize, and process all the incoming data.

Must read : Learn excel online free !

Essential activities in this phase include structuring the business problem in the form of an analytics challenge and formulating the initial hypotheses (IHs) to test and start learning the data. The subsequent phases are then based on achieving the goal that is drawn in this stage. So you will need to develop an understanding and concept that will later come in handy while testing it with data. 

Our learners also read : Python free courses !

upGrad’s Exclusive Data Science Webinar for you –

Transformation & Opportunities in Analytics & Insights

Phase 2: Data Preparation and Processing

This stage consists of everything that has anything to do with data. In phase 2, the attention of experts moves from business requirements to information requirements.

The data preparation and processing step involve collecting, processing, and cleansing the accumulated data. One of the essential parts of this phase is to make sure that the data you need is actually available to you for processing. The earliest step of the data preparation phase is to collect valuable information and proceed with the  data analytics lifecycle  in a business ecosystem. Data is collected using the below methods:

  • Data Acquisition:  Accumulating information from external sources.
  • Data Entry:  Formulating recent data points using digital systems or manual data entry techniques within the enterprise.
  • Signal Reception:  Capturing information from digital devices, such as control systems and the Internet of Things.

The Data preparation stage in the big data analytics life cycle requires something known as an analytical sandbox. This is a scalable platform that data analysts and data scientists use to process data. The analytical sandbox is filled with data that was executed, loaded and transformed into the sandbox. This stage in the business analytical cycle does not have to happen in a predetermined sequence and can be repeated later if the need arises. 

Read:  Data Analytics Vs Data Science

Top Data Science Skills to Learn

Phase 3: design a model.

After mapping out your business goals and collecting a glut of data (structured, unstructured, or semi-structured), it is time to build a model that utilizes the data to achieve the goal. This phase of the data analytics process is known as model planning. 

There are several techniques available to load data into the system and start studying it:

  • ETL (Extract, Transform, and Load) transforms the data first using a set of business rules, before loading it into a sandbox.
  • ELT (Extract, Load, and Transform) first loads raw data into the sandbox and then transform it.
  • ETLT (Extract, Transform, Load, Transform) is a mixture; it has two transformation levels.

Also read : Free data structures and algorithm course !

This step also includes the teamwork to determine the methods, techniques, and workflow to build the model in the subsequent phase. The model’s building initiates with identifying the relation between data points to select the key variables and eventually find a suitable model.

Data sets are developed by the team to test, train and produce the data. In the later phases, the team builds and executes the models that were created in the model planning stage. 

Explore our Popular Data Science Courses

Phase 4: model building.

This step of  data analytics architecture  comprises developing data sets for testing, training, and production purposes. The data analytics experts meticulously build and operate the model that they had designed in the previous step. They rely on tools and several techniques like decision trees, regression techniques ( logistic regression ), and neural networks for building and executing the model. The experts also perform a trial run of the model to observe if the model corresponds to the datasets.

It helps them determine whether the tools they have currently are going to sufficiently execute the model or if they need a more robust system for it to work properly. 

Checkout: Data Analyst Salary in India

Phase 5: Result Communication and Publication

Remember the goal you had set for your business in phase 1? Now is the time to check if those criteria are met by the tests you have run in the previous phase.

The communication step starts with a collaboration with major stakeholders to determine if the project results are a success or failure. The project team is required to identify the key findings of the analysis, measure the business value associated with the result, and produce a narrative to summarise and convey the results to the stakeholders.

Phase 6: Measuring of Effectiveness

As your  data analytics lifecycle  draws to a conclusion, the final step is to provide a detailed report with key findings, coding, briefings, technical papers/ documents to the stakeholders.

Additionally, to measure the analysis’s effectiveness, the data is moved to a live environment from the sandbox and monitored to observe if the results match the expected business goal. If the findings are as per the objective, the reports and the results are finalized. However, suppose the outcome deviates from the intent set out in phase 1then. You can move backward in the  data analytics lifecycle  to any of the previous phases to change your input and get a different output.

If there are any performative constraints in the model, then the team goes back to make adjustments to the model before deploying it. 

Also Read:  Data Analytics Project Ideas

Read our popular Data Science Articles

Importance of data analytics lifecycle.

The Data Analytics Lifecycle outlines how data is created, gathered, processed, used, and analyzed to meet corporate objectives. It provides a structured method of handling data so that it may be transformed into knowledge that can be applied to achieve organizational and project objectives. The process offers the guidance and techniques needed to extract information from the data and move forward to achieve corporate objectives.

Data analysts use the circular nature of the lifecycle to go ahead or backward with data analytics. They can choose whether to continue with their current research or abandon it and conduct a fresh analysis in light of the recently acquired insights. Their progress is guided by the Data Analytics lifecycle.

Big Data Analytics Lifecycle example

Take a chain of retail stores as an example, which seeks to maximize the prices of its products in order to increase sales. It is an extremely difficult situation because the retail chain has thousands of products spread over hundreds of sites. After determining the goal of the chain of stores, you locate the data you require, prepare it, and follow the big data analytics lifecycle.

You see many types of clients, including regular clients and clients who make large purchases, such as contractors. You believe that finding a solution lies in how you handle different types of consumers. However, you must consult the customer team about this if you lack adequate knowledge

To determine whether different client categories impact the model findings and obtain the desired output, you must first obtain a definition, locate data, and conduct hypothesis testing. As soon as you are satisfied with the model’s output, you may put it into use, integrate it into your operations, and then set the prices you believe to be the best ones for all of the store’s outlets.

This is a small-scale example of how deploying the business analytics cycle can positively affect the profits of a business. But this model is used across huge business chains in the world. 

Who uses Big data and analytics?

Huge Data and analytics are being used by medium to large-scale businesses throughout the world to achieve great success. Big data analytics technically means the process of analyzing and processing a huge amount of data to find trends and patterns. This makes them able to quickly find solutions to problems by making fast and adequate decisions based on the data. 

  • The king of online retail, Amazon, accesses consumer names, addresses, payments, and search history through its vast data bank and uses them in advertising algorithms and to enhance customer relations.
  • The American Express Company uses big data to study consumer behavior.
  • Capital One, a market leader, uses big data analysis to guarantee the success of its consumer offers.
  • Netflix leverages big data to understand the viewing preferences of users from around the world.
  • Spotify is a platform that is using the data analytics lifecycle in big data to its fullest. They use this method to make sure that each user gets their favourite type of music handed to them. 

Big data is routinely used by companies like Marriott Hotels, Uber Eats, McDonald’s, and Starbucks as part of their fundamental operations.

Benefits of Big data and analytics

Learning the life cycle of data analytics gives you a competitive advantage. Businesses, be it large or small, can benefit a lot from big data effectively. Here are some of the benefits of Big data and analytics lifecycle.

1. Customer Loyalty and Retention

Customers’ digital footprints contain a wealth of information regarding their requirements, preferences, buying habits, etc. Businesses utilize big data to track consumer trends and customize their goods and services to meet unique client requirements. This significantly increases consumer satisfaction, brand loyalty, and eventually, sales.

Amazon has used this big data and analytics lifecycle to its advantage by providing the most customized buying experience, in which recommendations are made based on past purchases and items that other customers have purchased, browsing habits, and other characteristics.

2. Targeted and Specific Promotions

With the use of big data, firms may provide specialized goods to their target market without spending a fortune on ineffective advertising campaigns. Businesses can use big data to study consumer trends by keeping an eye on point-of-sale and online purchase activity. Using these insights, targeted and specific marketing strategies are created to assist businesses in meeting customer expectations and promoting brand loyalty.

3. Identification of Potential Risks

Businesses operate in high-risk settings and thus need efficient risk management solutions to deal with problems. Creating efficient risk management procedures and strategies depends heavily on big data.

Big data analytics life cycle and tools quickly minimize risks by optimizing complicated decisions for unforeseen occurrences and prospective threats.

4. Boost Performance

The use of big data solutions can increase operational effectiveness. Your interactions with consumers and the important feedback they provide enable you to gather a wealth of relevant customer data. Analytics can then uncover significant trends in the data to produce products that are unique to the customer. In order to provide employees more time to work on activities demanding cognitive skills, the tools can automate repetitive processes and tasks.

5. Optimize Cost

One of the greatest benefits of the big data analytics life cycle is the fact that it can help you cut down on business costs. It is a proven fact that the return cost of an item is much more than the shipping cost. By using big data, companies can calculate the chances of the products being returned and then take the necessary steps to make sure that they suffer minimum losses from product returns. 

Ways to Use Data Analytics

Let’s delve into how this transformative data analysis stages can be harnessed effectively.

Enhancing Decision-Making

Data analytics life cycle sweeps away the fog of uncertainty, ushering in an era where decisions are grounded in insights rather than guesswork. Whether it’s selecting the most compelling content, orchestrating targeted marketing campaigns, or shaping innovative products, organizations leverage data analysis life cycle to drive informed decision-making. The result? Better outcomes and heightened customer satisfaction.

Elevating Customer Service

Customizing customer service to individual needs is no longer a lofty aspiration but a tangible reality with data analytics. The power of personalization, fueled by analyzed data, fosters stronger customer relationships. Insights into customers’ interests and concerns enable businesses to offer more than just products – they provide tailored recommendations, creating a personalized journey that resonates with customers.

Efficiency Unleashed

In the realm of operational efficiency, the life cycle of data analytics or data analytics phases emerges as a key ally. Streamlining processes, cutting costs, and optimizing production become achievable feats with a profound understanding of audience preferences. As the veil lifts on what captivates your audience, valuable time and resources are saved, ensuring that efforts align seamlessly with audience interests.

Mastering Marketing

Data analytics life cycle or data analytics phases empowers businesses to unravel the performance tapestry of their marketing campaigns. Insights gleaned allow for meticulous adjustments and fine-tuning strategies for optimal results. Beyond this, identifying potential customers primed for interaction and conversion becomes a strategic advantage. The precision of data analytics life cycle ensures that every marketing endeavor resonates with the right audience, maximizing impact.

Data Analytics Tools

Python: a versatile and open-source programming language.

Python stands out as a powerful and open-source programming language that excels in object-oriented programming. This language offers a diverse array of libraries tailored for data manipulation, visualization, and modeling. With its flexibility and ease of use, Python has become a go-to choice for programmers and data scientists alike.

R: Unleashing Statistical Power through Open Source Programming

R, another open-source programming language, specializes in numerical and statistical analysis. It boasts an extensive collection of libraries designed for data analysis and visualization. Widely embraced by statisticians and researchers, R provides a robust platform for delving into the intricacies of data with precision and depth.

Tableau: Crafting Interactive Data Narratives

Enter Tableau, a simplified yet powerful tool for data visualization and analytics. Its user-friendly interface empowers users to create diverse visualizations, allowing for interactive data exploration. With the ability to build reports and dashboards, Tableau transforms data into compelling narratives, presenting insights and trends in a visually engaging manner.

Power BI: Empowering Business Intelligence with Ease

Power BI emerges as a business intelligence powerhouse with its drag-and-drop functionality. This tool seamlessly integrates with multiple data sources and entices users with visually appealing features. Beyond its aesthetics, Power BI facilitates dynamic interactions with data, enabling users to pose questions and obtain immediate insights, making it an indispensable asset for businesses.

QlikView: Unveiling Interactive Analytics and Guided Insights

QlikView distinguishes itself by offering interactive analytics fueled by in-memory storage technology. This enables the analysis of vast data volumes and empowers users with data discoveries that guide decision-making. The platform excels in manipulating massive datasets swiftly and accurately, making it a preferred choice for those seeking robust analytics capabilities.

Apache Spark: Real-Time Data Analytics Powerhouse

Apache Spark, an open-source life cycle of data analytics engine, steps into the arena to process data in real-time. It executes sophisticated analytics through SQL queries and machine learning algorithms. With its prowess, Apache Spark addresses the need for quick and efficient data processing, making it an invaluable tool in the world of big data.

SAS: Statistical Analysis and Beyond

SAS, a statistical phases of data analysis software, proves to be a versatile companion for data enthusiasts. It facilitates analytics, data visualization, SQL queries, statistical analysis, and the development of machine learning models for predictive insights. SAS stands as a comprehensive solution catering to a spectrum of data-related tasks, making it an indispensable tool for professionals in the field.

What are the Applications of Data Analytics?

In the dynamic landscape of the digital era, business analytics life cycle applications play a pivotal role in extracting valuable insights from vast datasets. These applications empower organizations across various sectors to make informed decisions, enhance efficiency, and gain a competitive edge. Let’s delve into the diverse applications of business analytics life cycle and their impact on different domains.

Business Intelligence

Data analytics lifecycle case study applications serve as the backbone of Business Intelligence (BI), enabling businesses to transform raw data into actionable intelligence. Through sophisticated analysis, companies can identify trends, customer preferences, and market dynamics. This information aids in strategic planning, helping businesses stay ahead of the curve and optimize their operations for sustained success.

In the healthcare sector, data analytics applications contribute significantly to improving patient outcomes and operational efficiency. By analyzing patient records, treatment outcomes, and demographic data, healthcare providers can make data-driven decisions, personalize patient care, and identify potential health risks. This not only enhances the quality of healthcare services but also helps in preventing and managing diseases more effectively.

Finance and Banking

Financial institutions harness the power of data analytics applications or data analytics life cycles for example to manage risk, detect fraudulent activities, and make informed investment decisions. Analyzing market trends and customer behavior allows banks to offer personalized financial products, streamline operations, and ensure compliance with regulatory requirements. This, in turn, enhances customer satisfaction and builds trust within the financial sector.

In the realm of e-commerce, data analytics applications revolutionize the way businesses understand and cater to customer needs. By analyzing purchasing patterns, preferences, and browsing behavior, online retailers can create targeted marketing strategies, optimize product recommendations, and enhance the overall customer shopping experience. This leads to increased customer satisfaction and loyalty.

Data analytics applications are transforming the education sector by providing insights into student performance, learning trends, and institutional effectiveness. Educators can tailor their teaching methods based on data-driven assessments, identify areas for improvement, and enhance the overall learning experience. This personalized approach fosters student success and contributes to the continuous improvement of educational institutions.

Manufacturing and Supply Chain

In the manufacturing industry, data analytics applications optimize production processes, reduce downtime, and improve overall efficiency. By analyzing supply chain data, manufacturers can forecast demand, minimize inventory costs, and enhance product quality. This results in streamlined operations, reduced wastage, and increased competitiveness in the market.

The  data analytics lifecycle  is a circular process that consists of six basic stages that define how information is created, gathered, processed, used, and analyzed for business goals. However, the ambiguity in having a standard set of phases for  data analytics architecture  does plague data experts in working with the information. But the first step of mapping out a business objective and working toward achieving them helps in drawing out the rest of the stages.

upGrad’s Executive PG Programme in Data Science in association with IIIT-B and a certification in Business Analytics covers all these stages of  data analytics architecture . The program offers detailed insight into the professional and industry practices and 1-on-1 mentorship with several case studies and examples. Hurry up and register now!

Profile

Rohit Sharma

Something went wrong

Our Trending Data Science Courses

  • Data Science for Managers from IIM Kozhikode - Duration 8 Months
  • Executive PG Program in Data Science from IIIT-B - Duration 12 Months
  • Master of Science in Data Science from LJMU - Duration 18 Months
  • Executive Post Graduate Program in Data Science and Machine LEarning - Duration 12 Months
  • Master of Science in Data Science from University of Arizona - Duration 24 Months

Our Popular Data Science Course

Data Science Course

Data Science Skills to Master

  • Data Analysis Courses
  • Inferential Statistics Courses
  • Hypothesis Testing Courses
  • Logistic Regression Courses
  • Linear Regression Courses
  • Linear Algebra for Analysis Courses

Frequently Asked Questions (FAQs)

Yes, Data Analyst is one of the most in-demand job roles in 2022-23. If you’re thinking of pursuing Data Analytics as a career, now is probably the best time. According to research, more than 2.5 quintillion bytes of data are created every day, and this number keeps increasing at a fast pace. To make good use of this data for the company’s growth, a Data Analyst is required. India is the second most important hub of jobs for Data Analysts. Considering this fact, it is an excellent career option for those who want to learn the life cycle of data analytics.

The top skills required to become a Data Analyst are: 1. SQL is one of the most essential skills for a Data Analyst. It is the industry-standard database language which is used to handle large databases. 2. Solid programming skills in R, Python, Java, C++, etc. 3. A Data Analyst needs to have good critical thinking. He/she needs to understand the data beyond numbers. Identifying patterns in the data and extracting hidden insights from the data are some of the applications of critical thinking. 4. A Data Analyst needs to have mathematical skills. Two specific topics over which a Data Analyst needs to have command are Linear Algebra and Calculus. 5. Soft skills, like networking and communicating, are a cherry on the top.

According to Glassdoor, the average salary of a Data Analyst in India is around ₹6L/annum. However, the salary of a Data Analyst depends on several factors, including company size, the company’s reputation, location of the job, educational qualifications, work experience, and most importantly, your skills. An entry-level Data Analyst can easily make around ₹3L/annum, a mid-level Data Analyst with work experience of 5 to 9 years can make around ₹6L/annum, and a Senior Data Analyst who knows a life cycle of data analytics with work experience of 10 to 15 years can make up to ₹13L/annum. Data Analyst indeed is a high-paying job role, and if you’re interested in the field, it is totally worth it to pursue it.

Related Programs View All

big data analytics lifecycle case study

View Program

big data analytics lifecycle case study

Executive PG Program

Complimentary Python Bootcamp

big data analytics lifecycle case study

Master's Degree

Live Case Studies and Projects

big data analytics lifecycle case study

8+ Case Studies & Assignments

big data analytics lifecycle case study

Certification

Live Sessions by Industry Experts

ChatGPT Powered Interview Prep

big data analytics lifecycle case study

Top US University

big data analytics lifecycle case study

120+ years Rich Legacy

Based in the Silicon Valley

big data analytics lifecycle case study

Case based pedagogy

High Impact Online Learning

big data analytics lifecycle case study

Mentorship & Career Assistance

AACSB accredited

Placement Assistance

Earn upto 8LPA

big data analytics lifecycle case study

Interview Opportunity

big data analytics lifecycle case study

Self - Paced

230+ Hands-On Exercises

8-8.5 Months

Exclusive Job Portal

big data analytics lifecycle case study

Learn Generative AI Developement

Explore Free Courses

Study Abroad Free Course

Learn more about the education system, top universities, entrance tests, course information, and employment opportunities in Canada through this course.

Marketing

Advance your career in the field of marketing with Industry relevant free courses

Data Science & Machine Learning

Build your foundation in one of the hottest industry of the 21st century

Management

Master industry-relevant skills that are required to become a leader and drive organizational success

Technology

Build essential technical skills to move forward in your career in these evolving times

Career Planning

Get insights from industry leaders and career counselors and learn how to stay ahead in your career

Law

Kickstart your career in law by building a solid foundation with these relevant free courses.

Chat GPT + Gen AI

Stay ahead of the curve and upskill yourself on Generative AI and ChatGPT

Soft Skills

Build your confidence by learning essential soft skills to help you become an Industry ready professional.

Study Abroad Free Course

Learn more about the education system, top universities, entrance tests, course information, and employment opportunities in USA through this course.

Suggested Blogs

Data Science for Beginners: A Comprehensive Guide

by Harish K

28 Feb 2024

6 Best Data Science Institutes in 2024 (Detailed Guide)

by Rohan Vats

27 Feb 2024

Data Mining Architecture: Components, Types & Techniques

by Rohit Sharma

Sorting in Data Structure: Categories & Types [With Examples]

19 Feb 2024

Data Science Vs Data Analytics: Difference Between Data Science and Data Analytics

by Pavan Vadapalli

big data analytics lifecycle case study

  • Onsite training

3,000,000+ delegates

15,000+ clients

1,000+ locations

  • KnowledgePass
  • Log a ticket

01344203999 Available 24/7

Phases of Data Analytics Lifecycle: A Complete Guide

Explore the comprehensive Phases of the Data Analytics Lifecycle in this in-depth breakdown. Journey through the critical stages, including Data Discovery and Collection, Data Cleaning and Preprocessing, Data Exploration and Visualisation, Data Modeling and Analysis, Interpretation and Communication, Implementation and Integration, Monitoring and Maintenance, and more.

stars

Exclusive 40% OFF

Training Outcomes Within Your Budget!

We ensure quality, budget-alignment, and timely delivery by our expert instructors.

Share this Resource

  • Pandas for Data Analysis Training
  • Python Data Science Course
  • Big Data and Hadoop Solutions Architect
  • Hadoop Administration Training
  • Big Data Architecture Training

course

As far as the number of phases in the Data Analytics Lifecycle is concerned, there is no fixed number of phases. Each phase of the Data Analytics Lifecycle has specific objectives and activities that are tailored to the unique requirements of Data Analytics projects. In this blog, you will learn everything about what is Data Analytics Lifecycle in a step-by-step guide, including why it is so important. 

Table of Contents  

1) Data Discovery and Collection 

2) Data Cleaning and Preprocessing 

3) Data Exploration and Visualisation 

4) Data Modelling and Analysis 

5) Interpretation and Communication 

6) Implementation and Integration 

7) Monitoring and Maintenance 

8) Optimisation and Improvement 

9) Ethical Considerations 

10) Conclusion 

1. Data discovery   

The first phase of the Data Analytics Lifecycle is the data discovery step. This stage involves identifying potential data sources, both internal and external, that are relevant to the business problem at hand. It is essential to define the scope of the analysis and gather data from various databases, applications, and online repositories. Data can come in different formats, including structured, unstructured, and semi-structured data.  

Phases of Data Analytics Lifecycle

2. Data preparation  

Once the data is collected, it is crucial to clean and preprocess it before analysis. Data preparation involves identifying and rectifying errors, duplications, and inconsistencies in the dataset. This process ensures that the data is of high quality and ready for further analysis.  

Data preprocessing tasks may include data transformation, normalisation, and handling missing values. Cleaning and preprocessing are time-consuming but vital steps that significantly impact the accuracy and reliability of the final results. Proper data preprocessing can also help in dealing with noise and irrelevant data, leading to better outcomes.  

3. Data exploration   

The Data exploration phase aims to gain a deeper understanding of the dataset. It involves generating summary statistics, creating visual representations, and identifying patterns or trends within the data. Data visualisation techniques, such as charts, graphs, and heatmaps, are powerful tools for presenting complex information in a clear and accessible manner.  

Through data exploration and visualisation, analysts can identify outliers, correlations, and potential relationships among variables, providing valuable insights that may have gone unnoticed otherwise. Visualising data can make patterns more apparent and facilitate communication of findings with stakeholders.  

4. Data Modelling and analysis  

Data modelling is a very critical phase of the Data Analytics Lifecycle. In this phase, analysts use various statistical and Machine Learning (ML) techniques to develop models that can predict outcomes, classify data, or identify patterns. The choice of the appropriate model depends on the nature of the business problem and the type of data available.  

It is crucial to select the right model and fine-tune its parameters to achieve accurate and reliable results. Analysts may need to iterate and refine the model to ensure it meets the desired performance standards. Model evaluation and validation are critical to assess how well the model generalises to new data. 

5. Interpretation and communication  

Once the data is analysed and the models are built, it's time to interpret the results and communicate the findings effectively. The insights gained from Data Analytics are only valuable if they can be understood and acted upon by stakeholders.  

In this phase, Data Analysts need to present their findings in a clear and concise manner, using language that is accessible to non-technical audiences. Visual aids, such as charts and infographics, can be particularly helpful in conveying complex information. Effective communication ensures that decision-makers grasp the implications of the analysis and can make informed choices.  

6. Implementation and integration  

The implementation phase involves putting the insights derived from Data Analytics into action. This may include making data-driven decisions, optimising processes, or integrating the findings into existing systems or strategies.  

Effective implementation requires collaboration between Data Analysts and decision-makers, ensuring that the recommendations align with the organisation's goals and objectives. Integration of Data Analytics with business operations can lead to improved efficiency and better outcomes.  

Planning and implement a big data approach to your organisation with our Big Data Analysis Training!  

7. Monitoring and maintenance  

Data Analytics is not a one-time process; it requires continuous monitoring and maintenance to remain relevant and effective. New data may become available, and business needs may evolve, necessitating updates to the existing models or analyses.  

Regular monitoring helps to identify potential issues or changes in the data, ensuring the accuracy and reliability of the insights over time. Ongoing monitoring allows organisations to adapt to changing market conditions and stay ahead of the competition.  

8. Optimisation and improvement  

The optimisation and improvement phase of the Data Analytics Lifecycle is a critical step in enhancing the overall effectiveness and efficiency of the Data Analytics process. In this phase, Data Analysts dive into the insights and outcomes generated during the data modelling and analysis stage and identify areas for refinement and enhancement. By fine-tuning the existing models and methodologies, analysts can uncover hidden patterns, optimise predictive accuracy, and gain more valuable insights from the data.  

This phase involves a deep dive into the performance metrics of the developed models, such as accuracy, precision, recall, and F1-score, among others. Data Analysts assess how well the models are predicting outcomes and whether any discrepancies or errors need to be addressed. Based on this evaluation, they can apply advanced techniques, such as hyperparameter tuning or feature engineering, to improve the model's performance.  

Moreover, analysts also consider the feedback received from stakeholders and decision-makers who have implemented the insights from the data analytics. This feedback can provide valuable information on the real-world impact of data-driven decisions and identify any areas where the analytics process might need to be adjusted.  

As technology and business environments are constantly evolving, Data Analysts must stay up-to-date with the latest advancements in the field. Continuous learning and exploration of new tools and methodologies are essential in this phase to ensure that the Data Analytics process remains cutting-edge and effective.  

Additionally, the "Optimisation and Improvement" phase is an opportunity to explore alternative data sources or include additional data features that were not initially considered. This broader scope can potentially reveal new insights and lead to more comprehensive analyses.  

Data Analysts should also consider the scalability and robustness of their Data Analytics solutions during this phase. As businesses grow and data volumes increase, the analytics process must be capable of handling larger datasets without compromising accuracy or speed. Optimisation efforts may involve fine-tuning the infrastructure and computational resources to accommodate future growth. 

Data Science Analytics Course

9. Ethical considerations  

Throughout the Data Analytics Lifecycle, it is essential to address ethical considerations surrounding data privacy, security, and bias. As Data Analysts work with sensitive information, they must adhere to ethical guidelines and ensure the responsible use of data.  

Being transparent about data collection and analysis methods and actively mitigating biases helps build trust with stakeholders and the broader audience. Respecting privacy and confidentiality is vital to protect individuals and maintain the integrity of the Data Analytics process. 

Conclusion  

The Data Analytics Lifecycle is a comprehensive process that takes raw data and transforms it into valuable insights that drive business decisions. From data collection and cleaning to interpretation, implementation, and continuous evaluation, each stage plays a vital role in the success of Data Analytics initiatives. Embracing this lifecycle and leveraging the power of data can empower businesses to stay ahead and make data-driven decisions that lead to long-term success. 

Gain in-depth knowledge of Data Analytics with our Data Analytics With R Course !  

Frequently Asked Questions

Upcoming data, analytics & ai resources batches & dates.

Thu 2nd May 2024

Thu 12th Sep 2024

Thu 12th Dec 2024

Get A Quote

WHO WILL BE FUNDING THE COURSE?

My employer

By submitting your details you agree to be contacted in order to respond to your enquiry

  • Business Analysis
  • Lean Six Sigma Certification

Share this course

Our biggest spring sale.

red-star

We cannot process your enquiry without contacting you, please tick to confirm your consent to us for contacting you about your enquiry.

By submitting your details you agree to be contacted in order to respond to your enquiry.

We may not have the course you’re looking for. If you enquire or give us a call on 01344203999 and speak to our training experts, we may still be able to help with your training requirements.

Or select from our popular topics

  • ITIL® Certification
  • Scrum Certification
  • Change Management Certification
  • Business Analysis Courses
  • Microsoft Azure
  • Microsoft Excel & Certification Course
  • Microsoft Project
  • Explore more courses

Press esc to close

Fill out your  contact details  below and our training experts will be in touch.

Fill out your   contact details   below

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

Back to Course Information

Fill out your contact details below so we can get in touch with you regarding your training requirements.

* WHO WILL BE FUNDING THE COURSE?

Preferred Contact Method

No preference

Back to course information

Fill out your  training details  below

Fill out your training details below so we have a better idea of what your training requirements are.

HOW MANY DELEGATES NEED TRAINING?

HOW DO YOU WANT THE COURSE DELIVERED?

Online Instructor-led

Online Self-paced

WHEN WOULD YOU LIKE TO TAKE THIS COURSE?

Next 2 - 4 months

WHAT IS YOUR REASON FOR ENQUIRING?

Looking for some information

Looking for a discount

I want to book but have questions

One of our training experts will be in touch shortly to go overy your training requirements.

Your privacy & cookies!

Like many websites we use cookies. We care about your data and experience, so to give you the best possible experience using our site, we store a very limited amount of your data. Continuing to use this site or clicking “Accept & close” means that you agree to our use of cookies. Learn more about our privacy policy and cookie policy cookie policy .

We use cookies that are essential for our site to work. Please visit our cookie policy for more information. To accept all cookies click 'Accept & close'.

Big Data Analytics Tutorial

  • Big Data Analytics Tutorial
  • Big Data Analytics - Home
  • Big Data Analytics - Overview

Big Data Analytics - Data Life Cycle

  • Big Data Analytics - Methodology
  • Core Deliverables
  • Key Stakeholders
  • Big Data Analytics - Data Analyst
  • Big Data Analytics - Data Scientist
  • Big Data Analytics Project
  • Data Analytics - Problem Definition
  • Big Data Analytics - Data Collection
  • Big Data Analytics - Cleansing data
  • Big Data Analytics - Summarizing
  • Big Data Analytics - Data Exploration
  • Data Visualization
  • Big Data Analytics Methods
  • Big Data Analytics - Introduction to R
  • Data Analytics - Introduction to SQL
  • Big Data Analytics - Charts & Graphs
  • Big Data Analytics - Data Tools
  • Data Analytics - Statistical Methods
  • Advanced Methods
  • Machine Learning for Data Analysis
  • Naive Bayes Classifier
  • K-Means Clustering
  • Association Rules
  • Big Data Analytics - Decision Trees
  • Logistic Regression
  • Big Data Analytics - Time Series
  • Big Data Analytics - Text Analytics
  • Big Data Analytics - Online Learning
  • Big Data Analytics Useful Resources
  • Big Data Analytics - Quick Guide
  • Big Data Analytics - Resources
  • Big Data Analytics - Discussion
  • Selected Reading
  • UPSC IAS Exams Notes
  • Developer's Best Practices
  • Questions and Answers
  • Effective Resume Writing
  • HR Interview Questions
  • Computer Glossary

Traditional Data Mining Life Cycle

In order to provide a framework to organize the work needed by an organization and deliver clear insights from Big Data, it’s useful to think of it as a cycle with different stages. It is by no means linear, meaning all the stages are related with each other. This cycle has superficial similarities with the more traditional data mining cycle as described in CRISP methodology .

CRISP-DM Methodology

The CRISP-DM methodology that stands for Cross Industry Standard Process for Data Mining, is a cycle that describes commonly used approaches that data mining experts use to tackle problems in traditional BI data mining. It is still being used in traditional BI data mining teams.

Take a look at the following illustration. It shows the major stages of the cycle as described by the CRISP-DM methodology and how they are interrelated.

Life Cycle

CRISP-DM was conceived in 1996 and the next year, it got underway as a European Union project under the ESPRIT funding initiative. The project was led by five companies: SPSS, Teradata, Daimler AG, NCR Corporation, and OHRA (an insurance company). The project was finally incorporated into SPSS. The methodology is extremely detailed oriented in how a data mining project should be specified.

Let us now learn a little more on each of the stages involved in the CRISP-DM life cycle −

Business Understanding − This initial phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition. A preliminary plan is designed to achieve the objectives. A decision model, especially one built using the Decision Model and Notation standard can be used.

Data Understanding − The data understanding phase starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information.

Data Preparation − The data preparation phase covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools.

Modeling − In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, it is often required to step back to the data preparation phase.

Evaluation − At this stage in the project, you have built a model (or models) that appears to have high quality, from a data analysis perspective. Before proceeding to final deployment of the model, it is important to evaluate the model thoroughly and review the steps executed to construct the model, to be certain it properly achieves the business objectives.

A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached.

Deployment − Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that is useful to the customer.

Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data scoring (e.g. segment allocation) or data mining process.

In many cases, it will be the customer, not the data analyst, who will carry out the deployment steps. Even if the analyst deploys the model, it is important for the customer to understand upfront the actions which will need to be carried out in order to actually make use of the created models.

SEMMA Methodology

SEMMA is another methodology developed by SAS for data mining modeling. It stands for S ample, E xplore, M odify, M odel, and A sses. Here is a brief description of its stages −

Sample − The process starts with data sampling, e.g., selecting the dataset for modeling. The dataset should be large enough to contain sufficient information to retrieve, yet small enough to be used efficiently. This phase also deals with data partitioning.

Explore − This phase covers the understanding of the data by discovering anticipated and unanticipated relationships between the variables, and also abnormalities, with the help of data visualization.

Modify − The Modify phase contains methods to select, create and transform variables in preparation for data modeling.

Model − In the Model phase, the focus is on applying various modeling (data mining) techniques on the prepared variables in order to create models that possibly provide the desired outcome.

Assess − The evaluation of the modeling results shows the reliability and usefulness of the created models.

The main difference between CRISM–DM and SEMMA is that SEMMA focuses on the modeling aspect, whereas CRISP-DM gives more importance to stages of the cycle prior to modeling such as understanding the business problem to be solved, understanding and preprocessing the data to be used as input, for example, machine learning algorithms.

Big Data Life Cycle

In today’s big data context, the previous approaches are either incomplete or suboptimal. For example, the SEMMA methodology disregards completely data collection and preprocessing of different data sources. These stages normally constitute most of the work in a successful big data project.

A big data analytics cycle can be described by the following stage −

Business Problem Definition

Human resources assessment, data acquisition, data munging, data storage, exploratory data analysis, data preparation for modeling and assessment, implementation.

In this section, we will throw some light on each of these stages of big data life cycle.

This is a point common in traditional BI and big data analytics life cycle. Normally it is a non-trivial stage of a big data project to define the problem and evaluate correctly how much potential gain it may have for an organization. It seems obvious to mention this, but it has to be evaluated what are the expected gains and costs of the project.

Analyze what other companies have done in the same situation. This involves looking for solutions that are reasonable for your company, even though it involves adapting other solutions to the resources and requirements that your company has. In this stage, a methodology for the future stages should be defined.

Once the problem is defined, it’s reasonable to continue analyzing if the current staff is able to complete the project successfully. Traditional BI teams might not be capable to deliver an optimal solution to all the stages, so it should be considered before starting the project if there is a need to outsource a part of the project or hire more people.

This section is key in a big data life cycle; it defines which type of profiles would be needed to deliver the resultant data product. Data gathering is a non-trivial step of the process; it normally involves gathering unstructured data from different sources. To give an example, it could involve writing a crawler to retrieve reviews from a website. This involves dealing with text, perhaps in different languages normally requiring a significant amount of time to be completed.

Once the data is retrieved, for example, from the web, it needs to be stored in an easyto-use format. To continue with the reviews examples, let’s assume the data is retrieved from different sites where each has a different display of the data.

Suppose one data source gives reviews in terms of rating in stars, therefore it is possible to read this as a mapping for the response variable y ∈ {1, 2, 3, 4, 5} . Another data source gives reviews using two arrows system, one for up voting and the other for down voting. This would imply a response variable of the form y ∈ {positive, negative} .

In order to combine both the data sources, a decision has to be made in order to make these two response representations equivalent. This can involve converting the first data source response representation to the second form, considering one star as negative and five stars as positive. This process often requires a large time allocation to be delivered with good quality.

Once the data is processed, it sometimes needs to be stored in a database. Big data technologies offer plenty of alternatives regarding this point. The most common alternative is using the Hadoop File System for storage that provides users a limited version of SQL, known as HIVE Query Language. This allows most analytics task to be done in similar ways as would be done in traditional BI data warehouses, from the user perspective. Other storage options to be considered are MongoDB, Redis, and SPARK.

This stage of the cycle is related to the human resources knowledge in terms of their abilities to implement different architectures. Modified versions of traditional data warehouses are still being used in large scale applications. For example, teradata and IBM offer SQL databases that can handle terabytes of data; open source solutions such as postgreSQL and MySQL are still being used for large scale applications.

Even though there are differences in how the different storages work in the background, from the client side, most solutions provide a SQL API. Hence having a good understanding of SQL is still a key skill to have for big data analytics.

This stage a priori seems to be the most important topic, in practice, this is not true. It is not even an essential stage. It is possible to implement a big data solution that would be working with real-time data, so in this case, we only need to gather data to develop the model and then implement it in real time. So there would not be a need to formally store the data at all.

Once the data has been cleaned and stored in a way that insights can be retrieved from it, the data exploration phase is mandatory. The objective of this stage is to understand the data, this is normally done with statistical techniques and also plotting the data. This is a good stage to evaluate whether the problem definition makes sense or is feasible.

This stage involves reshaping the cleaned data retrieved previously and using statistical preprocessing for missing values imputation, outlier detection, normalization, feature extraction and feature selection.

The prior stage should have produced several datasets for training and testing, for example, a predictive model. This stage involves trying different models and looking forward to solving the business problem at hand. In practice, it is normally desired that the model would give some insight into the business. Finally, the best model or combination of models is selected evaluating its performance on a left-out dataset.

In this stage, the data product developed is implemented in the data pipeline of the company. This involves setting up a validation scheme while the data product is working, in order to track its performance. For example, in the case of implementing a predictive model, this stage would involve applying the model to new data and once the response is available, evaluate the model.

  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • *New* Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

big data analytics lifecycle case study

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

8 Steps in the Data Life Cycle

Business professional analyzing data as part of the data life cycle

  • 02 Feb 2021

Whether you manage data initiatives, work with data professionals, or are employed by an organization that regularly conducts data projects, a firm understanding of what the average data project looks like can prove highly beneficial to your career. This knowledge—paired with other data skills —is what many organizations look for when hiring.

No two data projects are identical; each brings its own challenges, opportunities, and potential solutions that impact its trajectory. Nearly all data projects, however, follow the same basic life cycle from start to finish. This life cycle can be split into eight common stages, steps, or phases:

  • Visualization
  • Interpretation

Below is a walkthrough of the processes that are typically involved in each of them.

Access your free e-book today.

Data Life Cycle Stages

The data life cycle is often described as a cycle because the lessons learned and insights gleaned from one data project typically inform the next. In this way, the final step of the process feeds back into the first.

Data Life Cycle

1. Generation

For the data life cycle to begin, data must first be generated. Otherwise, the following steps can’t be initiated.

Data generation occurs regardless of whether you’re aware of it, especially in our increasingly online world. Some of this data is generated by your organization, some by your customers, and some by third parties you may or may not be aware of. Every sale, purchase, hire, communication, interaction— everything generates data. Given the proper attention, this data can often lead to powerful insights that allow you to better serve your customers and become more effective in your role.

Back to top

2. Collection

Not all of the data that’s generated every day is collected or used. It’s up to your data team to identify what information should be captured and the best means for doing so, and what data is unnecessary or irrelevant to the project at hand.

You can collect data in a variety of ways, including:

  • Forms: Web forms, client or customer intake forms, vendor forms, and human resources applications are some of the most common ways businesses generate data.
  • Surveys: Surveys can be an effective way to gather vast amounts of information from a large number of respondents.
  • Interviews: Interviews and focus groups conducted with customers, users, or job applicants offer opportunities to gather qualitative and subjective data that may be difficult to capture through other means.
  • Direct Observation: Observing how a customer interacts with your website, application, or product can be an effective way to gather data that may not be offered through the methods above.

It’s important to note that many organizations take a broad approach to data collection, capturing as much data as possible from each interaction and storing it for potential use. While drawing from this supply is certainly an option, it’s always important to start by creating a plan to capture the data you know is critical to your project.

3. Processing

Once data has been collected, it must be processed . Data processing can refer to various activities, including:

  • Data wrangling , in which a data set is cleaned and transformed from its raw form into something more accessible and usable. This is also known as data cleaning, data munging, or data remediation.
  • Data compression , in which data is transformed into a format that can be more efficiently stored.
  • Data encryption , in which data is translated into another form of code to protect it from privacy concerns.

Even the simple act of taking a printed form and digitizing it can be considered a form of data processing.

After data has been collected and processed, it must be stored for future use. This is most commonly achieved through the creation of databases or datasets. These datasets may then be stored in the cloud, on servers, or using another form of physical storage like a hard drive, CD, cassette, or floppy disk.

When determining how to best store data for your organization, it’s important to build in a certain level of redundancy to ensure that a copy of your data will be protected and accessible, even if the original source becomes corrupted or compromised.

5. Management

Data management , also called database management, involves organizing, storing, and retrieving data as necessary over the life of a data project. While referred to here as a “step,” it’s an ongoing process that takes place from the beginning through the end of a project. Data management includes everything from storage and encryption to implementing access logs and changelogs that track who has accessed data and what changes they may have made.

6. Analysis

Data analysis refers to processes that attempt to glean meaningful insights from raw data. Analysts and data scientists use different tools and strategies to conduct these analyses. Some of the more commonly used methods include statistical modeling, algorithms, artificial intelligence, data mining, and machine learning.

Exactly who performs an analysis depends on the specific challenge being addressed, as well as the size of your organization’s data team. Business analysts, data analysts, and data scientists can all play a role.

7. Visualization

Data visualization refers to the process of creating graphical representations of your information, typically through the use of one or more visualization tools . Visualizing data makes it easier to quickly communicate your analysis to a wider audience both inside and outside your organization. The form your visualization takes depends on the data you’re working with, as well as the story you want to communicate.

While technically not a required step for all data projects, data visualization has become an increasingly important part of the data life cycle.

8. Interpretation

Finally, the interpretation phase of the data life cycle provides the opportunity to make sense of your analysis and visualization. Beyond simply presenting the data, this is when you investigate it through the lens of your expertise and understanding. Your interpretation may not only include a description or explanation of what the data shows but, more importantly, what the implications may be.

Other Frameworks

The eight steps outlined above offer an effective framework for thinking about a data project’s life cycle. That being said, it isn’t the only way to think about data. Another commonly cited framework breaks the data life cycle into the following phases:

  • Destruction

While this framework's phases use slightly different terms, they largely align with the steps outlined in this article.

A Beginner's Guide to Data and Analytics | Access Your Free E-Book | Download Now

The Importance of Understanding the Data Life Cycle

Even if you don’t directly work with your organization’s data team or projects, understanding the data life cycle can empower you to communicate more effectively with those who do. It can also provide insights that allow you to conceive of potential projects or initiatives.

The good news is that, unless you intend to transition into or start a career as a data analyst or data scientist, it’s highly unlikely you’ll need a degree in the field. Several faster and more affordable options for learning basic data skills exist, such as online courses.

Are you interested in improving your data science and analytical skills? Learn more about our online course Business Analytics , or download the Beginner’s Guide to Data & Analytics to learn how you can leverage the power of data for professional and organizational success.

big data analytics lifecycle case study

About the Author

Book cover

Artificial Intelligence and Transforming Digital Marketing pp 193–204 Cite as

Employing Applying Big Data Analytics Lifecycle in Uncovering the Factors that Relate to Causing Road Traffic Accidents to Reach Sustainable Smart Cities

  • Mohammad H. Allaymoun 4 ,
  • Mohammed Elastal 5 ,
  • Ahmad Yahia Alastal 6 ,
  • Tasnim Khaled Elbastawisy 6 ,
  • Dana Iqbal 6 ,
  • Amal Yaqoob 6 &
  • Adnan Sayed Ehsan 6  
  • First Online: 04 October 2023

754 Accesses

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 487))

This paper aims to identify the factors that relate to causing road traffic accidents to reach sustainable smart cities—this issue is significant to society and people. To that end, the paper pro-poses a model for uncovering such factors using big data analytics lifecycle: discovery, data preparation, model planning, model building, communication of results, and operationalization. In other words, the paper presents a simplified methodology taking advantage of the big data analysis life cycle and the velocity of analyzing data to reduce traffic accidents—velocity is the most important characteristic of big data. The velocity in analyzing data and identifying the factors that relate to causing car accidents helps decision-makers quickly make appropriate decisions. Worthy of mentioning that Google Data Studio (GDS) was used here to produce the visualizations and reports needed. Hopefully, researchers will build on this methodology in the future to achieve sustainability, to propose a more inclusive and effective methodology to minimize the negative effects of traffic accidents in the smart cities.

  • Big data analysis lifecycle
  • Decision making
  • Google data studio
  • Road traffic accidents

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Xu, Y., Chong, H.-Y., Chi, M.: Blockchain in the AECO industry: current status, key topics, and future research agenda. Autom. Constr. 134 , 104101 (2022)

Article   Google Scholar  

Satija, S.: A Review Study on Applications of Big Data in Business Organizations (2022)

Google Scholar  

Naeem, M., et al.: Trends and future perspective challenges in big data. In: Advances in Intelligent Data Analysis and Applications, pp. 309–325. Springer (2022)

Zhang, H., Zang, Z., Zhu, H., Uddin, M.I., Amin, M.A.: Big data-assisted social media analytics for business model for business decision making system competitive analysis. Inf. Process. Manage. 59 (1), 102762 (2022)

Allaymoun, M.H., Qaradh, S., Salman, M., Hasan, M.: Big data analysis and data visualization to help make a decision-Islamic banks case study. In: International Conference on Business and Technology, pp. 54–63. Springer (2023)

Allaymoun, M.H., Hamid, O.A.H.: Business intelligence model to analyze social network advertising. In: 2021 International Conference on Information Technology (ICIT), pp. 326–330. IEEE (2021)

Bhathal, G.S., Singh, A.: Big data: Hadoop framework vulnerabilities, security issues and attacks. Array 1 , 100002 (2019)

Goundar, S., Bhardwaj, A., Nur, S.S., Kumar, S.S., Harish, R.: Industrial internet of things: benefit, applications, and challenges. In: Innovations in the industrial internet of things (IIoT) and smart factory, pp. 133–148 (2021)

Ali, F., et al.: An intelligent healthcare monitoring framework using wearable sensors and social networking data. Futur. Gener. Comput. Syst. 114 , 23–43 (2021)

Arooj, A., Farooq, M.S., Akram, A., Iqbal, R., Sharma, A., Dhiman, G.: Big data processing and analysis in internet of vehicles: architecture, taxonomy, and open research challenges. Arch. Comput. Methods Eng. 29 (2), 793–829 (2022)

Anshari, M., Almunawar, M.N., Lim, S.A., Al-Mudimigh, A.: Customer relationship management and big data enabled: personalization and customization of services. Appl. Comput. Inform. 15 (2), 94–101 (2019)

Aceto, G., Persico, V., Pescapé, A.: Industry 4.0 and health: internet of things, big data, and cloud computing for healthcare 4.0. J. Ind. Inf. Integr. 18 , 100129 (2020)

Kibe, L.W., Kwanya, T., Owano, A.: Characteristics of big data produced by the Technical University of Kenya and Strathmore University. In: Proceedings of 20th Annual IS Conference, vol. 18, p. 279 (2019)

Ghasemaghaei, M., Calic, G.: Does big data enhance firm innovation competency? The mediating role of data-driven insights. J. Bus. Res. 104 , 69–84 (2019)

Allaymoun, M.H., Al Saad, L.H., Majed, Z.M., Hashem, S.M.A.: Big data analysis and data visualization to facilitate decision-making-mega start case study. In: International Conference on Business and Technology, pp. 370–379. Springer (2023)

Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., Herrera, F.: Transforming big data into smart data: an insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisc. Rev. Data Min. Knowl. Disc. 9 (2), e1289 (2019)

Ren, S., Zhang, Y., Liu, Y., Sakao, T., Huisingh, D., Almeida, C.M.: A comprehensive review of big data analytics throughout product lifecycle to support sustainable smart manufacturing: a framework, challenges and future research directions. J. Clean. Prod. 210 , 1343–1365 (2019)

Liu, Y., Zhang, Y., Ren, S., Yang, M., Wang, Y., Huisingh, D.: How can smart technologies contribute to sustainable product lifecycle management? J. Clean. Prod. 249 , 119423 (2020)

Chu, D., Cao, Y.: Typical intelligent transportation applications. In: Intelligent Road Transport Systems, pp. 545–608. Springer (2022)

Cui, Y.: AI applications in the rule of law. In: Blue Book on AI and Rule of Law in the World, pp. 77–114. Springer (2020)

Dong, X., Yan, M., Hu, Y.: Management transformation and system building in line with international standards. In: Huawei, pp. 165–199. Springer (2023)

Allaymoun, M.H., Khaled, M., Saleh, F., Merza, F.: Data visualization and statistical graphics in big data analysis by google data studio–sales case study. In: 2022 IEEE Technology and Engineering Management Conference (TEMSCON EUROPE), pp. 228–234. IEEE (2022)‏

Kemp, G., White, G.: Starting your data studio journey. In: Google Data Studio for Beginners, pp. 1–13. Springer (2021)

Road Traffic Accidents: Kaggle (2022). https://www.kaggle.com/datasets/saurabhshahane/road-traffic-accidents

Tseng, C.M.: Social-demographics, driving experience and yearly driving distance in relation to a tour bus driver’s at-fault accident risk. Tour. Manage. 33 (4), 910–915 (2012)

Karlberg, L., Undén, A.L., Elofsson, S., Krakau, I.: Is there a connection between car accidents, near accidents, and type A drivers? Behav. Med. 24 (3), 99–106 (1998)

Download references

Author information

Authors and affiliations.

Administrative Science Department, College of Administrative and Financial Science, Gulf University, Sanad, 26489, Kingdom of Bahrain

Mohammad H. Allaymoun

Mass Communication and Public Relations, College of Communication and Media Technologies, Gulf University, Sanad, 26489, Kingdom of Bahrain

Mohammed Elastal

Accounting and Financial Science, College of Administrative and Financial Science, Gulf University, Sanad, 26489, Kingdom of Bahrain

Ahmad Yahia Alastal, Tasnim Khaled Elbastawisy, Dana Iqbal, Amal Yaqoob & Adnan Sayed Ehsan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mohammad H. Allaymoun .

Editor information

Editors and affiliations.

College of Business and Finance, Ahlia University, Manama, Bahrain

Allam Hamdan

Department of Management and Marketing, Ahlia University, Manama, Bahrain

Esra Saleh Aldhaen

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Cite this chapter.

Allaymoun, M.H. et al. (2024). Employing Applying Big Data Analytics Lifecycle in Uncovering the Factors that Relate to Causing Road Traffic Accidents to Reach Sustainable Smart Cities. In: Hamdan, A., Aldhaen, E.S. (eds) Artificial Intelligence and Transforming Digital Marketing. Studies in Systems, Decision and Control, vol 487. Springer, Cham. https://doi.org/10.1007/978-3-031-35828-9_18

Download citation

DOI : https://doi.org/10.1007/978-3-031-35828-9_18

Published : 04 October 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-35827-2

Online ISBN : 978-3-031-35828-9

eBook Packages : Engineering Engineering (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BMC Health Serv Res

Logo of bmchsr

How can big data analytics be used for healthcare organization management? Literary framework and future research from a systematic review

Nicola cozzoli.

Department of Economics, University of Foggia, Via Caggese n.1, Foggia, Italy

Fiorella Pia Salvatore

Nicola faccilongo, michele milone, associated data.

The datasets analyzed during the current study are not publicly available due to data relating to scientific journal names and authors but are available from the corresponding author on reasonable request.

Multiple attempts aimed at highlighting the relationship between big data analytics and benefits for healthcare organizations have been raised in the literature. The big data impact on health organization management is still not clear due to the relationship’s multi-disciplinary nature. This study aims to answer three research questions: a) What is the state of art of big data analytics adopted by healthcare organizations? b) What about the benefits for both health managers and healthcare organizations? c) What about future directions on big data analytics research in healthcare?

Through a systematic literature review the impact of big data analytics on healthcare management has been examined. The study aims to map extant literature and present a framework for future scholars to further build on, and executives to be guided by.

The positive relationship between big data analytics and healthcare organization management has emerged. To find out common elements in the studies reviewed, 16 studies have been selected and clustered into 4 research areas: 1) Potentialities of big data analytics. 2) Resource management. 3) Big data analytics and management of health surveillance systems. 4) Big data analytics and technology for healthcare organization.

Conclusions

In conclusion is identified how the big data analytics solutions are considered a milestone for managerial studies applied to healthcare organizations, although scientific research needs to investigate standardization and integration of the devices as well as the protocol in data analysis to improve the performance of the healthcare organization.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12913-022-08167-z.

Big data is transforming and will transform the healthcare organizations in the near future [ 1 , 2 ]. Scientific literature in the managerial context applied to healthcare organizations, consider the Big Data Analytics (BDA) a fundamental tool, so much so that it has attracted the attention of the scientific community and stakeholders [ 3 ]. However, a premise should be made: data by themselves explain little, thus, to be useful in the healthcare organization management, firstly it is necessary to validate their quality, and secondly, find the right correlations. In other words, the data should be processed, analyzed, and interpreted with the appropriate tools [ 4 , 5 ].

Technological applications in healthcare BDA-related are rapidly increasing [ 6 ] and will increasingly characterize managers’ decision-making process. For example, IBM’s Watson project [ 7 ] is a "super-computer" that has scoured through several million scientific articles over the last twenty years and uses artificial intelligence tools (e.g., Machine Learning) to correlate disease symptoms and predict possible diagnostic scenarios. This case helps to understand how and to what extent BDA could really support healthcare managers to improve their decision processes, while increasing the performance of the healthcare organization.

Nowadays, the amount of data is no longer an issue. Internet traffic reports from Cisco and other network operators have estimated the entire digital universe to be 44 zettabytes and 463 exabytes will be the daily information could be generated by 2025. A new era took place in which the processes of production and management of human knowledge will no longer be the exclusive preserve of humans; machines will also play their part as knowledge producers [ 8 ]. From pharmaceutical companies to healthcare organizations, this enormous potential of data products, combined with IoT applications and AI tools [ 9 – 11 ], will play a significant role in the near future. Today, the medical applications based on IoT allow the monitoring of clinical data through the production of data generated by special devices (e.g., wearable devices) [ 12 ], remotely accessible by a physician rather than by caregivers [ 13 ].

The market size is a useful indicator of how much the healthcare organizations are turning their attention to new management models based on the use of big data. By 2025, the big data market in healthcare will touch $70 billion with a record 568% growth in 10 years. The use of such a tool not only represents a complex challenge [ 14 ], but also opens opportunities for all those involved in the healthcare supply chain who manage decision-making processes. Moreover, if on the one hand this technology will influence the definition of new managerial strategies within healthcare organizations, on the other hand, it will have positive repercussions on the effectiveness and efficiency of healthcare processes [ 15 ]. Indeed, the big data technology is used by healthcare managers to get, for example, information related to the list of doctors and nurses, the list of drugs with their expiration date, etc., in order to have tools for facilitating decision-making processes, improving the quality of services provided, and, at the same time, rationalizing the use of resources, by facilitating the management of the healthcare organization as a whole.

The BDA satisfies multiple needs that, on the one hand, influence the quality of the healthcare organization’s performance and, on the other hand, are useful in directing management strategies to improve the supply of healthcare services. Below there are some strategies, which aim to:

  • Provide specific services to patients, from diagnostics to preventive medicine passing through therapeutic adherence.
  • Detect the onset and spread of diseases in advance.
  • Observe parameters inherent to hospital quality standards, promoting control and prevention actions.
  • Modify treatment techniques.
  • Facilitate research and development in pharmacology, reducing the time to market of drugs.
  • Facilitate research and development of new and specific medical devices.

The main aim of this research is, therefore, to provide both an integrative framework on the state of art, and perspectives on how the BDA can be useful for the management of the healthcare organization. Considering the results, food-for-thought on how this technological and cultural revolution will affect the modus operandi of healthcare organizations will be launched.

Through an overview of recent scientific studies, this research aims to raise awareness among both practitioners and managers about BDA tools applied to healthcare management to address more effectively and efficiently the challenges imposed by an increasing demand for healthcare services.

In this regard, the study provides a systematic literature review (SLR) to explore the effect of BDA on the healthcare management by analyzing articles from the Scopus database during a period of 5 years (2016 – 2021).

Furthermore, the result through a content analysis, aspires to be a privileged starting point to find out potential barriers and opportunities provided by BDA-based management systems for smarter healthcare organization. Specifically, the study answers different research questions (RQs) as different levels of analysis have been performed. By analyzing the relationship between BDA-based management systems and the benefits delivered to the organizations, the research could not be conducted without exploring the state of art of BDA tools deployed in the field of healthcare. Thus, starting from this background the discussion on the future perspectives on BDA development in the healthcare organizations appears as a need.

Theoretical framework

Why use BDA and how to exploit its potential for healthcare organization management? This is the main question asked by managers and decision makers working in the healthcare sector. In recent years there have been multiple attempts in the literature aimed at highlighting the relationship between implementation of BDA and benefits for healthcare organizations, in terms of both resource efficiency and process management.

In 2017, a study by Wang and Hajli [ 16 ] has proposed a model founded on Resource-Based Theory and BDA Capabilities (BDAC) to explain the relationship between BDA, benefits, and value creation for healthcare organizations. As stated by Srinivasan and Swink [ 17 ], BDAC refers to “ organizational facility with tools, techniques, and processes that enable a firm to process, organize, visualize, and analyze data, thereby producing insights that enable data-driven operational planning, decision-making, and execution ”. In the healthcare organization, BDAC represents the ability to collect, store, analyze, and process huge volume variety, and velocity of health data come from various sources to improve data-driven decisions [ 18 , 19 ]. Indeed, the study of Wang and Hajli [ 16 ], validated on an empirical basis by 109 cases of BDA tools implementation in 63 healthcare organizations, has demonstrated how specific "path-to-value" can be identified. By varying degrees of relevance of the identified pathways, it has been shown that alongside the challenges of implementing certain BDA tools, there are corresponding specific benefits for healthcare organizations. Preliminarily, the study has defined the ability to analyze big data through the concept of Information Lifecycle Management (ILM) [ 20 ]. In this perspective, the capabilities of the BDA in healthcare organizations are configured as the abilities to process health data from diverse sources and provide significant information to healthcare managers. Thorough BDA, managers can detect timely indicators and identify business strategies, which allow them to put in place perspective plans, efficient strategies, and programs to increase the performance of organizations.

Researchers have found that BDA capabilities primarily stem from the implementation of various tools and features. Specifically, in order of importance, BDA capabilities are firstly triggered by processing tools (e.g., OLAP, machine learning, NLP), followed by aggregation tools (e.g., data warehouse tools), and, secondly, by data visualization tools and capabilities (e.g., visual dashboards/systems, reporting systems/interfaces).

Among the potentials triggered by the implementation of BDA in the healthcare organization, the analytical one was the main capability, that is the ability to process clinical data characterized by immense volume, variety (from text to graph), and speed (from batch to streaming), using descriptive analysis techniques [ 21 , 22 ]. In this regard, it is important to note that BDA-based management systems are the only ones capable of analyzing semi-structured or unstructured data. This represents a crucial element for revealing correlation patterns that are difficult to determine with traditional management systems [ 23 ]. Furthermore, the launch of these systems in a healthcare organization ensures the ability to effectively manage outputs regarding care process and service in order to constantly improve the performance of the organization. In summary, the characteristics of BDA-based management systems implemented in a healthcare organization, are:

  • predictive analytics capability, i.e., the ability to explore data and identify useful correlations, patterns and trends, and extrapolate them to predict what is likely to occur in the future [ 24 , 25 ];
  • interoperability capability, i.e., the ability to integrate data and processes to support management, collaboration, and sharing across different healthcare departments, managers, and facilities [ 26 ], and finally,
  • traceability capability, i.e., the ability to integrate and track all patient history data from different IT facilities and different healthcare units.

In terms of expected benefits from the BDA implementation, the study of Wang and Hajli [ 16 ] has showed that the most important ones are obtained from improved operational activities, such as improved quality and accuracy of healthcare decisions, rapid processing of issues, and the ability to enable treatments proactively before patients’ conditions worsen. Next, in terms of relevance, they were the benefits related to IT infrastructure, such as standardization and reduced costs for redundant infrastructure and the ability to quickly transfer data between different IT systems. Substantially, they have delivered a useful business model that healthcare managers can draw on to evaluate the specific leverages they need to activate in relation to the implementation of the BDA-based management systems. In addition to highlighting the undoubted benefits, the authors clearly show how specific BDA tools can facilitate the decision-making processes of healthcare managers and make them faster and more effective.

In another study carried out to identify BDA benefits and supports, and to drive organizational strategies, Wang, Kung, and Byrd [ 19 ], through the analysis of 26 case studies related to the BDA applications in the healthcare organization, have identified five "capabilities" of BDA: analytic capability for care patterns, unstructured data analytical capability, decision support, predictive, and traceability capabilities [ 19 ]. The study is remarkably interesting because in addition to mapping precise benefits, it also recommends specific strategies considering the BDA implementation for healthcare organizations. These strategies are useful for achieving effective results by leveraging the potential of BDA.

The first successful strategy is to implement governance based on the use of big data, starting with a definition of objectives, procedures, and key performance indicators (KPIs). Once again, one of the discriminating factors for success in implementing such a strategy remains the integration of information systems and the standardization of data protocols that often come from heterogeneous sources already existing in healthcare organizations. The second strategy is related to developing a culture of data sharing. The third one considers the training of healthcare managers, who cannot ignore knowledge related to BDA, for example on the use of data mining and business intelligence tools. The fourth strategy is related to the storage of big data, often available in heterogeneous formats, and is identified in the transition from the more expensive traditional storage systems (NAS) to more efficient and effective systems such as cloud computing solutions. The last strategic driver involves pathways related to the implementation of predictive BDA models. The mastery of KPIs, interactive visualization and data aggregation tools such as dashboards and reports should be acquired instruments for healthcare managers and in general for healthcare organizations oriented to BDA driven process management strategies.

More recent studies focus attention on the management practices supply chain in healthcare. In the study performed by Yu et al. [ 27 ], the authors, interviewing senior executives in Chinese hospitals, show on both a theoretical and empirical basis, how BDAC positively impacts the three dimensions of hospital supply chain integration (SCI) (inter-functional integration, hospital-patient integration and hospital-supplier integration) and how SCI, in turn, contributes to improve the operational flexibility [ 27 ]. By “operational flexibility” in the healthcare organization, it is meant the ability of a ward to adapt its operating procedures in relation to unforeseen circumstances while meeting the needs of patients [ 28 , 29 ].

The scholars have delivered an important contribution in demonstrating the relationship between BDAC, SCI, and operational flexibility from multiple perspectives, by providing useful management guidance for healthcare executives and managers involved in the supply chain. By analyzing and processing medical and managerial data with advanced analytical techniques, Chinese healthcare organizations were able to facilitate decision-making process with timely and appropriate actions, for example, tracking people's movements during the lockdown caused by the Coronavirus, understanding ongoing health trends, and managing pharmaceutical supplies [ 30 , 31 ].

This theoretical framework provides a key to interpreting the benefits offered by good practices deriving from the use of the BDA in the healthcare organization.

At the same time, the rigorous scientific method allows the validation of empirical experiences in relation to clear theoretical references. In the next paragraph projects that demonstrate what is stated in the literature are shown.

Practical framework

N(ursing)  +  Care App is an mHealth application that supports the work of frontline health workers (FHW) in developing countries [ 32 ]. The system is designed to collect not only patient data, but also diagnostic images. It is also given the opportunity to add recommended doctors based on the advice of FHWs in case the patient needs to follow a specific hospital visit.

For healthcare managers, predicting the number of emergency department accesses is a critical issue which complicates the optimization of the human resource management. To this end, Intel, and Assistance Publique-Hôpitaux de Paris (AP-HP), the largest hospital university in Europe, leveraging datasets from multiple sources, worked together to build a cloud-based solution to predict the number of patient visits to emergency rooms and hospital admissions. This predictive analytics tool, will enable healthcare managers at AP-HP hospitals to know the number of emergency room visits and hospital admissions at 15 days in order to reduce wait times, optimize human resource (HR) levels based on anticipated needs, accurately plan patient loads, including by pathology, and overall improve the quality and efficiency of services provided by the healthcare organization [ 33 ].

Chronic conditions, if not kept under control through a rigorous program of therapeutic adherence, can become a source of both more serious physical problems for patients and economic burdens for healthcare organizations. Another project that actively introduced BDA tools into healthcare management was carried out by the European Commission to launch production of the drug Enerzair Breezhaler . It was the first drug for the treatment of asthma co-packaged and co-prescribed with the Propeller digital platform. The app sends a reminder to comply with therapeutic adherence and maintains a record of the data, which the patient shares with him or her physician. Studies have demonstrated that the Propeller platform increases the degree of asthma control by up to 63%, therapeutic adherence by up to 58% [ 34 ], and reduces asthma emergency department visits and hospital admissions by up to 57% [ 35 ].

The practical framework described, aided by some empirical experience, only partially reveals the potential offered by BDA. The diffusion of BDA-based management systems in the healthcare organization will trigger a virtuous circle, allowing soon to accumulate increasingly accurate medical data. By exploiting the most advanced AI technologies, BDA will support predictive analysis, allow physicians to make more accurate and faster diagnostic pathways and managers to use results. It will help health practitioners in the decision-making process, optimize the use of resources with a consequent costs reduction and, overall, improve the quality of services provided by healthcare organizations.

The main aim of this study is to update the state of art about the BDA-based management systems adopted in the healthcare organization, underlining management advantages for both the organizations and managers. BDA has the potential to reduce the cost of care, prevent disease outbreaks, and improve the patients’ quality of life. Through its ability to process and cross-reference massive amounts of both management, and clinical information, BDA promises to be an effective support tool for both healthcare managers and patients.

To achieve this aim, a Systematic Literature Review (SLR) was performed. This method identifies, evaluates, and summarizes the updates that raise from the literature about the BDA tools used to improve both the healthcare organizations performance and patients’ quality of life. The method takes inspiration from the protocol used by Khanra S., et al. [ 36 ] which considers inclusion and exclusion criteria.

The present study aims to add a contribute to the literature by addressing three RQs:

  • What is the state of art of BDA adopted by healthcare organizations?
  • What about the benefits for both health managers and healthcare organization?
  • What about future directions on BDA research in healthcare?

To answer the RQs, as widespread electronic database Scopus has been selected. To obtain an international validity of studies, the research only considers papers in English. Utilizing the Boolean operator “AND”, the following keywords have been searched: “big data analytics” AND “healthcare” AND “management”. As inclusion criteria, only papers published from 2016 to 2021 have been considered. As subject areas, “medicine” and “business, management and accounting” have been selected. Instead, as exclusion criteria, article in press and the following documents type: “review”, “book”, “conference review”, “letter” and “note” have not been taken into account. Also, to avoid a dispersal of the study, conference proceedings have been excluded. Following the searching protocol, 34 results have been obtained (Fig.  1 ).

An external file that holds a picture, illustration, etc.
Object name is 12913_2022_8167_Fig1_HTML.jpg

Workflow of articles selection

An excel spreadsheet was used to perform the extraction procedures while the statistical analyses were carried out using the software STATA 16 ©. The list of the extracted papers investigated with the content analysis can be found in the Appendix.

The work proceeds through a descriptive analysis. After that, a content analysis has been performed to identify the most relevant characteristics of the BDA-based management systems, underlining the positive impact for the healthcare organizations, without neglecting to outline the trends for the future scenarios and research directions.

According to the SLR, the iterative process shown in the Fig.  1 , has allowed to delete the duplicates and match the results with the RQs.

As shown in Fig.  1 the initial search on Scopus database has delivered 227 results. By limiting research to papers published between 2016 and 2021, 11% of records have been removed. At the second stage, by selecting the subject areas, the screening has allowed to exclude 131 records; thus, the 57.7% of the results initially selected. The last step of the process has conducted to exclude document types such as Review, Book, Conference Review, Letter, and Note. In other words, 37 records were excluded, representing 16.3% of the sample. At the end of the screening process, 34 articles were selected, representing about 15% of the sample.

In the descriptive analysis the time distribution of the studies from 2016 to 2021 is included. It is important to note the increasing of publication trend from 2017 to 2019. This output confirms a growing interest in the research field of BDA applied to healthcare organizations (Fig.  2 ).

An external file that holds a picture, illustration, etc.
Object name is 12913_2022_8167_Fig2_HTML.jpg

Trend of research steams

The trend of research steams considers a sample of 34 scientific contributions as they come from the screening process above described. Although 6% of the total sample was collected in the years 2016 and 2017, it is only indicative of the growing trend of scientific studies on BDA in healthcare sector. The overall incidence in 2018 was 12% but the turning point was reached in 2019 as 32% of the studies collected in the sample were reached. This outcome could be read considering the Covid-19 pandemic outbreak which has been a representative testing ground for BDA tools by helping managers and decision-makers to plan healthcare managerial strategies.

In this context, the use of the BDA by Chinese healthcare organizations for tracking people's flow during the lockdown, represents an important case study that has registered the peak in the time flow of research. By looking at 2020 and 2021 data, which represent respectively 24% and 21% of the total scientific contributions, the growing trend seems to be confirmed by validating the rising interest in BDA research seen as a planning tool for healthcare processes.

The pie-chart shows the scientific production by country. It is necessary to specify that Scopus database clusters the studies by home country author’s organization, therefore the same study could be referred to more than one country and thus belong to more than one cluster.

The geographical locations of the studies showed in the Fig.  3 outlining India, UK, and USA as more than one third of the total scientific producers. It is well known that IT companies as Google, Apple, Amazon, and Microsoft are investing considerable resources on BDA tools for healthcare. China and India contribute together with 22% of the scientific articles. Big data technology has played a key role in virus tracking during the pandemic crisis. The "Internet Plus Healthcare", a big data center in Zhongwei (China), provides cloud services to both healthcare institutions and IT companies. In Yinchuan (China), an industrial park for big data acts as a catalyst for IT company involved in healthcare sector. India confirms to be one of the heavily adopter countries of artificial intelligence, big data analytics, and IoT technologies. Although India must face the challenge to provide basic healthcare services in a predominantly rural country, start-ups with BDA skills in healthcare are springing up.

An external file that holds a picture, illustration, etc.
Object name is 12913_2022_8167_Fig3_HTML.jpg

Geographical locations of the studies

It is also important underlining the performance of the European countries. UK, Greece, Italy, Spain, Germany, and Portugal support the research with almost 40% of the studies published, confirming that Europe will be a driving force for the BDA research in the next future. The development of a European Health Data Space (EHDS) is an ambitious project of the European Commission. It will lead member states to share an efficient infrastructure for both exchange and management health data by providing citizens with equal treatment, free access to clinical data, and quality healthcare services.

In the area “Others” all the other countries contributing marginally to research have been included.

The next step of the study is focused on a content analysis to show the experiences of applying BDA in healthcare organizations.

Starting from the 34 articles selected for the descriptive analysis, to identify in detail the core issue of the study, a second screening was performed. 18 articles were excluded because weakly focused on the research objective which concerns specifically how BDA can be used for healthcare organization management. Thus, after an in-depth reading of abstracts and full papers, the scholars have identified 16 papers closer targeted on the mentioned research objective. The 16 studies selected through a content analysis were clustered into 4 research areas (RAs) as showed in the following table (Table ​ (Table1). 1 ). The clustering procedure identifies 4 relevant topics: Potentialities of BDA (RA1), Resource management (RA2), BDA and management of health surveillance system (RA3), BDA technology for healthcare organization (RA4). The proposed clustering has been though to give an easy-to-go research map and to support the healthcare managers.

Clusters by relevant topics

RA1: potentialities of BDA

Wang and Hajli [ 16 ] define BDA potentialities in the healthcare context as “ the ability to acquire, store, process and analyze large amounts of health data in various forms, and deliver meaningful information to users, which allows them to discover business values and insights in a timely fashion ”. The relationship between BDA and the benefits for the healthcare organizations it has been well expressed by the theory of the “path to value chain” [ 16 ]. This path represents an important contribution to the exploration of business value, not only for drawing the generic and well-established connection between big data capabilities [ 19 ] and the benefits, but also for empirically showing how capabilities can be developed and what benefits can be achieved in the healthcare organizations. Another study included in this area, explores the key role of BDA capabilities in developing healthcare supply chain integrations and its impact on hospital flexibility [ 27 ]. Specifically, the BDA has a fundamental role in developing healthcare integration supply chain and the operational flexibility. Considering the health and economic crises caused by the Covid-19, this dimension of BDA has been an especially important leverage for managers to improve operational flexibility of the healthcare organizations. The ability to provide predictive models and real-time insights, is a powerful prospective of the BDA for helping healthcare professionals and managers in decision-making process. In this regard, the literature presents several applications of big data in healthcare that support the data collection, management, and integration of data in healthcare organizations [ 37 ]. Moreover, BDA enables the integration of massive datasets, supporting decisions of manager and monitoring the managerial aspects of healthcare organizations. Building a decision-making process based on BDA, firstly means identifying the big data keys that can implement ad-hoc strategies to improve efficiency along the healthcare value chain. To this end, the research carried out by Sousa et al., [ 37 ] underlines the benefits that BDA can give to the decision-making process, through predictive models and real-time analytics, assisting in the collection, management, and integration of data in healthcare organizations.

To date, thanks to an integrated and interconnected ecosystem, is becoming possible to provide personalized healthcare services, collect an enormous quantity of both clinical and biometrics data and, thus, implement BDA instruments. Nevertheless, to take a real advantage from these tools and turn them into useful decision support systems (DSS), is necessary for R&D to be focused on data filtering mechanisms in order to obtain good-quality reliable information [ 38 ]. The healthcare models based on BDA and implementation of new healthcare programs, enable both medical and managerial decision support for the healthcare services provision. New types of interactions with and among users of the healthcare ecosystem will produce in the next future a wide variety of complex data, thus, the main challenges refer to information processing and analytics.

In light of the above, the RA1 includes studies for which the quality of data and the need for high performance filtering mechanisms are becoming keys factor for the success of BDA-based management systems in the healthcare organizations. For example, the study carried out by Maglaveras et al., [ 38 ], included in this area, explores new R&D pathways in biomedical information processing and management, as well as to the design of new intelligent decision support systems.

RA2: resource management

Another important research direction emerged from the literature review, concerns positive impact of the BDA on the resource management. Insufficient policy for managing medical materials waste, energy use and environmental burden, restricts the resources conservation. The BDA is extremely useful in this aspect; it could provide in the next future an important contribution to implement the circular economy processes and to support sustainable development initiatives in the healthcare organizations [ 39 ]. To this end, the study developed by Kazançoğlu et al. [ 39 ], underline the importance of circularity and sustainability concepts to mitigate the sector’s negative impacts on the environment. Furthermore, the study identifies the barriers related to circular economy in the healthcare organization and provides solutions to these barriers by implementing BDA-based management systems. Lastly, the authors, have developed a managerial, policy and theoretical framework to support healthcare managers to launch sustainable initiatives in the context of healthcare organization.

The impact on the performance has been also investigated by studies that have linked benefits of BDA and artificial intelligence with green supply chain integration process [ 40 ]. Digital learning is more becoming a “moderator” of the green supply chain process with a significant positive impact on environmental performance of the healthcare organization. BDA-AI technologies will lead to improvement of the environmental process integration and green supply chain collaboration and, consequently, will support the managers’ decisions involved in the supply processes. This study also provides an important reference framework for logistics/supply chain managers who want to implement BDA-AI technologies for supporting green supply processes and enhancing environmental performance of the healthcare organization [ 40 ].

Nowadays, many scholars are focusing on BDA-driven decision support systems to sustain the healthcare managers [ 41 ]. These types of BDA-based analytical tools will provide a useful quantitative support for managers of healthcare organizations. The authors have reported design and technical details of the system implementations using case studies. They have developed a toolkit which represents a framework reference for resources management, allowing to create strategic models and obtain analytical results for evidence-based decisions and managerial evaluations.

In this RA, two other important topics investigated by BDA are: high quality healthcare service, and healthcare costs. Optimize the supply chain activities is an imperative to keep lower the healthcare costs. The data generated by medical equipment and devices can be successfully used in forecasting, decision-making process, and to make more efficient the healthcare supply chain management [ 42 ]. The study carried out by Alotaibi et al. [ 42 ], thus, presents a review on the use of big data in healthcare organizations underling opportunities and challenges deriving from the application of BDA-based management systems within the organizations.

As already asserted, a good implementation of BDA in the healthcare organization will play a fundamental role in improving the clinical outcomes management, giving helpful insights for decision makers and managers, in order to avoiding diseases, reducing healthcare expenses, and improving the performance of the healthcare organization [ 43 ]. However, to achieve these ambitious outcomes the research will face a crucial challenge: how to rationalize, make easily usable, and at affordable costs, heterogeneous data coming from diverse sources. The research developed by Kundella and Gobinath [ 43 ] represents an important contribute to explore key challenges, techniques, technologies, privacy issues, security algorithms and future directions of the use of BDA in the healthcare organization.

RA3: BDA and management of health surveillance system

The rise of BDA promises to solve many healthcare challenges in the developing countries. The BDA applied to healthcare organization help managers to rationalize the resources, and health system to better delivery treatments to the patients [ 44 ]. In this regard, the government of Zambia is thinking to implement BDA solutions to provide more effective and efficient healthcare services. A well-managed health surveillance system represents an important driver to improve the quality of life and reduce the medical waste, especially in developing countries where the lack of resources is severe and limits economic development. For all these reasons, Europe is investing on BDA initiatives in public health and in the oncology sectors, to generate new knowledge, improve clinical care and make more efficient the management of the public health surveillance system [ 45 ]. The BDA capability for identifying specific population pattern, managing high volume of data and turn it into real (or near real) time insights, contributes to identify it as a powerful tool to support the managers for the decision-making processes. Despite this, implementing a BDA-based management systems within the healthcare organizations requires investment in the human capital, strong collaboration with stakeholders, and data integration with and among the healthcare units. To this end, Gunapal et al., [ 46 ] has highlighted that Singapore has setup a Regional Health System (RHS) database to facilitate BDA for proactive population health management (PHM) and health services research [ 46 ]. The structure of the healthcare database has been built collecting data from four database coming from three RHSs: National Healthcare Group (NHG), Tan Tock Seng Hospital (TTSH), National University Hospital (NUH) and Alexandra Hospital (AH). The result has been a database including information useful for the healthcare managers which incorporates data on patient demographics, chronic disease, and healthcare utilization information. These characteristics facilitate the identification of specific patients’ paths linked by past healthcare utilization and chronic disease information. Converging information into a single database helps to understand the cross-utilization of healthcare services across the three RHSs. A such approach allows to setup the RHSs structure for initiative-taking population health management (PHM) and to improve the performance of healthcare organizations [ 46 ].

RA 4: BDA technology for healthcare organization

The wearable devices and different kind of sensors, able to collect clinical data, in combination with BDA, will constitute the basis of personalized medicine and will be crucial tools to improve the performance of healthcare organizations [ 47 ]. The scientific research has to face the important challenge to adapt data acquisition, storage, transmission and analytics to healthcare demand. Thus, the healthcare data should be categorized, homogenized, and implemented into specific models by adapting machine-learning techniques to the nature of the healthcare organization.

A fruitful field of interest for the application of BDA in healthcare organization is the diagnostic imaging. To take out maximum benefits from it and to be useful for managers of healthcare organizations, it is necessary to implement digital platforms and applications [ 48 ]. Indeed, the simple production of a large amount of data does not automatically translate to an advantage for the healthcare performance. Specific applications are required to favor the correct and advantageous management of diagnostic images [ 48 ]. The link between BDA and IoT technologies, as instrument to incorporate the accessibility, capacity to customize, and practical conveyance of clinical data, emerged as another research direction investigated by the papers included in this RA. These tools allow: (1) the healthcare organizations to decrease expenses; (2) the people to self regulates treatments; (3) practitioners to take as quickly as possible decisions in remote way and keep constant contact with patients [ 49 ].

In light of these results, it is possible to state that IoT, big data, and artificial intelligence as machine-learning algorithms, are three of the most significative innovations in the healthcare organization. These types of organizations are implementing home-centric data collection networks and intelligent BDA systems based on machine learning technologies. For example, a high-level implementation of these systems has been efficiently implemented in Cartagena, Colombia, for hypertensive patients by using an e-Health sensor and Amazon Web Services components [ 50 ]. The authors stress the importance of using the combination of IoT, big data, and artificial intelligence as tools to obtain better health outcomes for the communities and improved performance for healthcare organization. The new generation of machine-learning algorithms can use standardized data sets generated by these sources to improve the effectiveness of public health interventions [ 50 ]. To this end, as pointed out by numerous studies in the field of BDA applied on healthcare organizations, it becomes crucial for the next future research to concentrate R&D efforts towards full standardized dataset protocols.

As highlighted by the results, in Europe, as well as in the rest of the world, a significant trend is emerging among healthcare organizations in adopting BDA-based management systems [ 45 ]. Among the clustering process performed, the common element in the studies reviewed is the positive relationship between BDA tools and achievable benefits for healthcare organizations.

As emerged by the RAs, some studies explore business value for healthcare organizations and the concept of potentialities of BDA (RA1) to explain the evidence of precise path-to-value chains leading to specific benefits [ 16 ]. These perspectives provide useful guidelines for healthcare managers who want to consider implementing BDA tools in their organizations. Some authors in particular focus on the role of BDA capabilities in the development of hospital supply chain integration and operational flexibility, demonstrating a positive relationship between the two dimensions [ 27 ]. During the Covid-19 outbreak, it became clearer how important operational flexibility is to healthcare organizations. The scholars also underline how BDA can impact to the efficiency of the decision-making processes in healthcare organizations, through predictive models and real-time analytics, helping health professionals in the collection, management, and analysis [ 37 ].

In general, BDA-based management systems make personalized care programs possible. However, considering the enormous amount and heterogeneity of information available nowadays, it emerges the necessity to address R&D pathways towards data filtering mechanisms and engineering new intelligent decision support systems within the healthcare organizations [ 38 ].

Circular economy (CE) and sustainability concepts are becoming important key drivers in healthcare organizations to reduce negative impact on the environment (RA2). Some study directions look at BDA as tool to provide solution for barriers related to CE and support sustainable development initiatives in the healthcare organizations [ 39 ]. Empirical studies have demonstrated the benefits of BDA-AI in the supply chain integration process and its impact on environmental performance. By assessing a sample of 168 French hospitals, Benzidia et al. [ 40 ], has observed that the use of BDA-AI technologies has a significant impact on environmental process integration and green supply chain. In particular, this study provides important insights for healthcare managers, who wish to implement BDA-AI technologies for sustaining green supply processes and improving environmental performance [ 40 ]. BDA and web technologies can successfully help managers to redesign healthcare processes making them more effective and efficient. Since healthcare spending is constantly growing in the world’s major regions, there is urgent need to redesign processes optimizing supply chain activities such that high-quality services could be provided at lower costs [ 42 ]. Although BDA-based management systems promise to fulfil this role in the healthcare organization, more in-depth studies are required. Due to heterogeneity of information sources, one of future research direction should deeply investigate the protocol standardization and integration in data analyzing as well as techniques and technologies used, security algorithms of BDA in the healthcare and medical data [ 43 ].

In developing countries, as well as in the rest of the world, the management of health surveillance is a sensitive issue (RA3). Therefore, authors have studied main key factors that hind BDA access in the healthcare organization [ 44 ]. Technology, staff, data management and health policies have been identified as some of decisive variables [ 44 ]. Due to increasing of the ageing population and the related disability, healthcare organizations will face hard challenges soon. To this end, big data can also help healthcare managers to detect patterns and to turn high volumes of data into usable knowledges. In this context investments in technological infrastructures are needed as well as in the human capital [ 45 ]. China is proving, with a large scale of investment, to be a pioneer country in the adoption of BDA-based management systems in the healthcare organization [ 46 ].

The rising of AI, IoT, machine learning [ 49 – 51 ], and sensors technology, as well as embedded systems able to communicate each other, have boosted the adoption of BDA with valuable benefits for the healthcare organization (RA4). These technologies will play a fundamental role on big data management to improve the performances of the healthcare organizations. Some authors have underlined privacy issues related to healthcare data and the necessity to make sensor data homogeneous and tagged. Furthermore, implementation of clinical records into models and adaptation of machine-learning techniques is required [ 47 ]. Future R&D in this field should be focused on the developing of digital platforms and specific applications based on BDA also for managing diagnostic images [ 48 ].

By exploring the relationship between BDA-based management systems and the benefits delivered to the healthcare organizations, this study replies to 3 RQs: 1) What is the state of art of BDA adopted by healthcare organizations, 2) What are the benefits for both health managers and healthcare organizations and 3) What are the future directions on BDA research in healthcare.

To answer the RQs the SLR has started from an investigation on the recent literature BDA about the BDA in healthcare organizations. Descriptive analysis has been performed on a sample of 34 studies coming from all over the world. The second stage shows a detailed content analysis on 16 studies which better answer to research question about the relationship between benefits for the healthcare organization and BDA solutions.

By analyzing the successful BDA strategies in healthcare context, some authors focus their attention on the BDA potentialities applied in the healthcare organizations [ 16 , 37 ]. Indeed, the research highlights how analytical tools through personal health systems support public health management systems and how BDA suggests new pathways to support healthcare managers in decision-making process.

In the literature, other scholars highlight the positive impact of BDA on resource management. The BDA solutions are analyzed as tools to sustain CE initiatives [ 38 , 39 ] as well as to enable green supply chain process integration and improve hospital performance [ 40 ]. By exploiting KPIs coming from BDA solutions, some researchers present innovative models for planning public health policy [ 41 ]. In this context, the studies consider BDA cloud computing solutions and social media data analytics for supporting the performance of healthcare supply chain management [ 42 , 43 ]. Furthermore, researchers from all around the world are showing particular interest on BDA for health surveillance system management [ 44 – 46 ].

According to the recent literature, BDA is transforming the healthcare organizations. The SLR has showed how the BDA solutions are now quite considered a milestone for managerial studies applied to healthcare organizations. The Coronavirus pandemic has been a good test run for using BDA to design healthcare policy strategies. Although an extensive literature on BDA to support healthcare management is being produced, the classification into four RAs proposed is an attempt to examine precise key research directions. About that, the limitations of the present research can be detected as the difficulty to review a field of literature constantly evolving. To date, the amount of data is no longer an issue. To be useful in the healthcare context, is necessary to validate their quality and then find the right correlations. In other words, the data should be processed, analyzed, and interpreted correctly. For this reason, emerges the need to address research pathways towards filtering mechanisms, by converting data from big to smart, and engineering new decision support systems within the healthcare organizations [ 38 ].

The content analysis carried out in this research has shown that studies are addressed to find out new models for both predictive and personalized medicine by exploiting BDA technologies [ 47 ]. The researchers underline the added value of using BDA both in the medical diagnostic process [ 48 ] and jointly with IT technologies such as IOT and machine learning [ 49 , 51 ].

Thus, considering the results obtained, it is possible to state that BDA can effectively help healthcare managers to detect common patterns and turn high volumes of data into usable knowledges. Investments on human capital become a priority to exploit the potential of BDA [ 45 ].

To achieve these objectives the future research should provide usable insights and standardized procedures for training healthcare managers and practitioners. AI, machines learning, as well as management strategies, will also play their part as knowledge producers in the healthcare organization. Privacy issues related to healthcare data and also the necessity to make sensor data homogeneous, are becoming crucial research topics to be faced. Finally, due to the heterogeneity of information sources, the future direction of research should investigate the standardization and integration of the protocol in data analysis, as well as the techniques useful for the managerial sector to implement increasingly BDA-based management systems in future healthcare organizations [ 43 ].

Nowadays the challenge for healthcare organizations is the development of useful applications BDA-based. According with the circular economy view, the future research directions should be addressed considering the relationship between digitalization and management resources consumption. The data centralization combined with a BDA approach can effectively support circular economy processes in healthcare supply chain by reducing waste and resource consumptions.

Exploiting the BDA’s capabilities will also be a key factor in forecasting and monitoring outbreaks. Future studies will need to focus on developing more efficient models for sharing data in order to improve the performance of healthcare organizations around the world.

Acknowledgements

Not applicable.

Authors' contributions

NC and FPS designed and conducted the empirical study, wrote and revised the manuscript. NC and FPS carried out the analysis and wrote the results, discussion and conclusions. NC, FPS, NF, and MM revised the manuscript. All authors read the manuscript and approved the final version.

The research was carried out without funding.

Availability of data and materials

Declarations.

The authors declare that they have no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Nicola Cozzoli, Email: [email protected] .

Fiorella Pia Salvatore, Email: [email protected] .

Nicola Faccilongo, Email: [email protected] .

Michele Milone, Email: [email protected] .

youtube logo

How Netflix uses Data Analytics: A Case Study

The contribution of big data and analytics in the success of netflix..

By Sarthak Niwate

image

Netflix’s current company valuation is $234 billion. It is currently renowned as the most valued company/media company in the world and transcends even Disney. The success lies in a secret term that is no secret (but the way it’s used in a certain way is a secret) — customer retention.

Customer retention may be defined as the process of engaging the customers and appealing to them to use the service or buy the product.

Now, this may look like a simple tactic at first glance but do note that this is considered by many as the most powerful tactic used by any media company. And Netflix used it so intelligently that their customer retention rate is extremely impressive and keeps increasing over the years.

Can you guess the total subscribers of Netflix? Up until December 2020, Netflix subscribers (paid subscription) amounted to a whopping 203.66 million. This is an excellent milestone for Netflix, as it has crossed the 200 million mark for the first time.

image

Netflix has really gone a long way ahead of its competitors because of its more successful TV shows and movies that have garnered attention and a high number of views. This has helped escalate the rate of subscriptions. Netflix has been more successful in identifying the true interest of customers or audiences.

Are you wondering why people choose Netflix? And why you chose it?

I came across an informative blog that talked about the top reasons why people choose Netflix. I thought that I should share this here.

image

How Netflix uses data and big data analytics?

The question looks so simple and straightforward, but only people having the background or experience of working, studying, or playing with data can understand the depth of this question.

For any company or organization, data collection is essential. Imagine Netflix with its 203 million subscribers. Studying the traits of the data of this many customers would be a tremendous task. Netflix uses the collected information, converts them into insights, results, or visualizations, and recommends TV shows and movies as per customers’ preferences and interests. Just read this line again — it almost feels like a supernatural talent or power.

You should be able to relate if you’re a Netflix user. According to Netflix’s study, viewer activity depends on personalized recommendations and the results are true for over 75% of subscribers. Diving deeper into it, several data points have been collected and a detailed profile of each subscriber has been generated. It's hard to believe but, the profile of a subscriber created by Netflix is much more detailed than the information or preferences provided by the subscriber at the beginning of their Netflix usage.

If I want to generalize this, data collected by Netflix is mostly about customer interaction on the application or webpage and responsiveness to shows or movies. To put it simply, if you’re watching any TV show or movie on Netflix, it knows the date, location, and device being used to watch, as well as the time of your watching. On top of that, Netflix also knows about how and when you pause and resume your shows and movies. They also take into consideration if you are completing the show or not, how many hours, days, or weeks to complete the episode or a season or a movie.

Ultimately, it tracks every action taken by the user on Netflix and considers it as a data point. How many metrics will be there in total which Netflix might be using for data collection?

The work is not yet completed! What are some of the extreme points you can think of after reading the last paragraph? Give it a try. You will realize the amount of effort and intelligence required of and implemented by Netflix. Do you have a habit of watching your favorite scenes repeatedly? Then Netflix knows that. Netflix captures screenshots of scenes that viewers watch repeatedly and it categorizes you as per the rating. It keeps track of how many times you search before choosing to watch a show and even what keywords you have used in your search. Imagine how beautiful that data would look if gathered properly. And then after the collection of data, the data is cleaned and a buzz word is implemented, that being the ‘recommendation algorithms’.

At this point in time, you might have understood the reason behind the success of Netflix’s tremendous ability to collect, process, and use data.

Netflix’s ability to collect and use the data is the reason behind its success. It results in better customer retention per year. The study says the rate of customer retention is increasing on Netflix because 80% of users follow the recommendation, and the recommended show or movie is streamed.

Have you ever heard of ‘green-light original content’? Green-light means being allowed to do something. So, green-lit original content is verified or rated content approved on the basis of various touchpoints taken from the user database.

Big data and certain analysis techniques are used for custom marketing, say, for example, to promote a TV show or a movie Netflix releases (which might have various promos or trailers). If a viewer watches content that is more centered on women, the viewer will get a trailer of a movie that is more focused on female characters in that movie.

However, the same applies to many aspects like someone watches movies of certain directors only or certain actors or actresses only. This in and out study or report of each customer reduces the time spent to research on marketing strategies because Netflix already knows the interests and sensitive likes or dislikes of their subscribers.

This is nothing but tracking the actions of subscribers and collecting their data based on this. One technique that is very traditional and Netflix uses that too is to take feedbacks from subscribers. The feedback is then converted into a rating and then the team works on system improvement or recommendations.

Netflix veteran Joris Evers says that there are 33 million various versions of Netflix.

Thank you for reading!

image

Continue Learning

How to remove image background using python.

Remove images’ backgrounds using the Python library Rembg.

How to build a Desktop Application using Python

Best practices and tools

Difference Between Python List and NumPy Array

Python List vs NumPy Array - What's the difference?

Plot Stock Chart Using mplfinance in Python

An introductory guide to plot a candlestick chart along with volume, MACD & stochastic using mplfinance.

Excel Style Conditional Formatting in Pandas

How to make your pandas tables more colorful and your data more intuitively readable

Turn your Python Script into a 'Real' Program with Docker

No one cares if you can reverse a linked list — they want a one-click way to use your software on their machine. Docker makes that possible.

Top 20 Analytics Case Studies in 2024

big data analytics lifecycle case study

Although the potential of Big Data and business intelligence are recognized by organizations, Gartner analyst Nick Heudecker says that the failure rate of analytics projects is close to 85%. Uncovering the power of analytics improves business operations, reduces costs, enhances decision-making , and enables the launching of more personalized products.

In this article, our research covers:

How to measure analytics success?

What are some analytics case studies.

According to  Gartner CDO Survey,  the top 3 critical success factors of analytics projects are:

  • Creation of a data-driven culture within the organization,
  • Data integration and data skills training across the organization,
  • And implementation of a data management and analytics strategy.

The success of the process of analytics depends on asking the right question. It requires an understanding of the appropriate data required for each goal to be achieved. We’ve listed 20 successful analytics applications/case studies from different industries.

During our research, we examined that partnering with an analytics consultant helps organizations boost their success if organizations’ tech team lacks certain data skills.

*Vendors have not shared the client name

For more on analytics

If your organization is willing to implement an analytics solution but doesn’t know where to start, here are some of the articles we’ve written before that can help you learn more:

  • AI in analytics: How AI is shaping analytics
  • Edge Analytics in 2022: What it is, Why it matters & Use Cases
  • Application Analytics: Tracking KPIs that lead to success

Finally, if you believe that your business would benefit from adopting an analytics solution, we have data-driven lists of vendors on our analytics hub and analytics platforms

We will help you choose the best solution tailored to your needs:

big data analytics lifecycle case study

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month. Cem's work has been cited by leading global publications including Business Insider , Forbes, Washington Post , global firms like Deloitte , HPE, NGOs like World Economic Forum and supranational organizations like European Commission . You can see more reputable companies and media that referenced AIMultiple. Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization. He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider . Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Next to Read

14 case studies of manufacturing analytics in 2024, what is data virtualization benefits, case studies & top tools [2024], iot analytics: benefits, challenges, use cases & vendors [2024].

Your email address will not be published. All fields are required.

Related research

Top 10 Healthcare Analytics Use Cases & Challenges in 2024

Top 10 Healthcare Analytics Use Cases & Challenges in 2024

Healthcare Analytics: Importance & Market Landscape in 2024

Healthcare Analytics: Importance & Market Landscape in 2024

IMAGES

  1. technology or tool used in the big data life cycle Illustration of big

    big data analytics lifecycle case study

  2. What is Big Data Analytics?

    big data analytics lifecycle case study

  3. Data Analytics Lifecycle

    big data analytics lifecycle case study

  4. 6 Phases of Data Analytics LifeCycle You Should Know

    big data analytics lifecycle case study

  5. Big Data Analytics Lifecycle

    big data analytics lifecycle case study

  6. technology or tool used in the big data life cycle Illustration of big

    big data analytics lifecycle case study

VIDEO

  1. Data Science "BIG DATA"

  2. DATA ENGINEER 🆚 DATA ANALYST 🆚 DATA SCIENTIST🔥

  3. When Did "Data" Become BIG DATA? Solutions to Cover the Full Big Data Lifecycle

  4. Ch2. Big Data Analytics Lifecycle

  5. Big Data Analytics Advantages

  6. Week 06: Lecture 26: Introduction to Analytics and Big Data

COMMENTS

  1. Big Data Analytics Life Cycle

    It differs from traditional data analysis, mainly due to the fact that in big data, volume, variety, and velocity form the basis of data. The Big Data Analytics Life cycle is divided into nine phases, named as : Business Case/Problem Definition. Data Identification. Data Acquisition and filtration. Data Extraction.

  2. 6 Phases of Data Analytics Lifecycle Every Data Analyst Should ...

    The data analytics life cycle in big data constitutes the fundamental steps in ensuring that the data is being acquired, processed, analyzed and recycles properly. ... Data analytics lifecycle case study applications serve as the backbone of Business Intelligence (BI), enabling businesses to transform raw data into actionable intelligence. ...

  3. Data Analytics Lifecycle

    The chapter also shows a Global Innovation Network and Analytics (GINA) case study as an example of how a team applied the data analytics lifecycle to analyze innovation data at EMC. Bibliography T. H. Davenport and D. J. Patil , " Data Scientist: The Sexiest Job of the 21st Century ," Harvard Business Review, October 2012 .

  4. Phases of Data Analytics Lifecycle: A Step-by-Step Guide

    5) Interpretation and Communication. 6) Implementation and Integration. 7) Monitoring and Maintenance. 8) Optimisation and Improvement. 9) Ethical Considerations. 10) Conclusion. 1. Data discovery. The first phase of the Data Analytics Lifecycle is the data discovery step.

  5. Big Data Analytics

    These stages normally constitute most of the work in a successful big data project. A big data analytics cycle can be described by the following stage −. Business Problem Definition. Research. Human Resources Assessment. Data Acquisition. Data Munging. Data Storage. Exploratory Data Analysis.

  6. Data Analytics Lifecycle

    Summary. This chapter presents an overview of the data analytics lifecycle that includes six phases including discovery, data preparation, model planning, model building, communicate results and operationalize. Through these steps, data science teams can identify problems and perform rigorous investigation of the datasets needed for in-depth ...

  7. 8 Steps in the Data Life Cycle

    Data Life Cycle Stages. The data life cycle is often described as a cycle because the lessons learned and insights gleaned from one data project typically inform the next. In this way, the final step of the process feeds back into the first. 1. Generation. For the data life cycle to begin, data must first be generated.

  8. Big Data Analytics Lifecycle

    Big Data Analytics Lifecycle. Big Data analysis differs from traditional data analysis primarily due to the volume, velocity and variety characteristics of the data being processes. ... Case Study Example. The majority of ETI's IT team is convinced that Big Data is the silver bullet that will address all of their current issues. However, the ...

  9. The Power of Big Data and Data Analytics for AMI Data: A Case Study

    The subsequent chapter presents the case study implemented, describing the data sources, the big data framework, and the data analytics techniques implemented. Lastly, the visualization and access stages account for how advanced information analysis can deliver, beyond a superficial description, relevant knowledge for utilities and their clients.

  10. Employing Applying Big Data Analytics Lifecycle in ...

    The big data analytics lifecycle is a circular process that includes discovering the problem, data preparation, model planning, model building, communicating the results, and operationalizing the results. ... and over 500 video monitoring systems, according to a case study from Intel . The city traffic management division's data center houses ...

  11. 6 Phases of Data Analytics LifeCycle You Should Know

    Table of contents: Importance of Data Analytics Life Cycle. Data Analytics Life Cycle Phases. Phase 1: Data Discovery and Formation. Phase 2: Data Preparation and Processing. Phase 3: Design a Model. Phase 4: Model Building. Phase 5: Result Communication and Publication. Phase 6: Measuring Effectiveness.

  12. A comprehensive review of big data analytics throughout product

    A comprehensive review of big data analytics throughout product lifecycle to support sustainable smart manufacturing: A framework, challenges and future research directions ... In the demonstrated case study, the process management layer utilises Disco for process mining, the simulation layer utilises Matlab SimEvent for discrete-event ...

  13. 20+ Most Effective Big Data Analytics Use Cases

    The biomedical research and is a huge use case for big data analytics and the applications can themselves form a topic of lengthy discussion. 7. Business and Management. investing in AI and big data to streamline operations, implement digitization and introduce automation, among other business objectives.

  14. PDF Big Data Fundamentals

    Likewise, the Big Data analytics lifecycle imposes distinct processing requirements. • Chapter 4 examines current approaches to enterprise data warehousing and busi-ness intelligence. It then expands this notion to show that Big Data storage and analysis resources can be used in conjunction with corporate performance moni-

  15. The Power of Big Data and Data Analytics for AMI Data: A Case Study

    The Big Data Laboratory of the National Institute of Electricity and Clean Energies (INEEL) played the role of big data framework provider, depicted in Figure 2. Figure 2. Data framework allocated for the case study. The Apache Hadoop processing cluster consists of one master unit and two slaves.

  16. How can big data analytics be used for healthcare organization

    Background. Big data is transforming and will transform the healthcare organizations in the near future [1, 2].Scientific literature in the managerial context applied to healthcare organizations, consider the Big Data Analytics (BDA) a fundamental tool, so much so that it has attracted the attention of the scientific community and stakeholders []. ...

  17. 5 Big Data Case Studies

    Top 5 Big Data Case Studies. Following are the interesting big data case studies -. 1. Big Data Case Study - Walmart. Walmart is the largest retailer in the world and the world's largest company by revenue, with more than 2 million employees and 20000 stores in 28 countries. It started making use of big data analytics much before the word ...

  18. Big Data Analytics for Smart Manufacturing: Case Studies in ...

    Smart manufacturing (SM) is a term generally applied to the improvement in manufacturing operations through integration of systems, linking of physical and cyber capabilities, and taking advantage of information including leveraging the big data evolution. SM adoption has been occurring unevenly across industries, thus there is an opportunity to look to other industries to determine solution ...

  19. Top 10 Big Data Case Studies that You Should Know

    Top 10 Big Data Case Studies. 1. Big data in Netflix. Netflix implements data analytics models to discover customer behavior and buying patterns. Then, using this information it recommends movies and TV shows to their customers. That is, it analyzes the customer's choice and preferences and suggests shows and movies accordingly.

  20. Big Data

    The Big Data analytics lifecycle, including business case evaluation, data preparation, extraction, transformation, analysis, and visualization Perfect for data scientists, data engineers, and database managers, Big Data also belongs on the bookshelves of business intelligence analysts who are required to make decisions based on large volumes ...

  21. How Netflix uses Data Analytics: A Case Study

    Big data and certain analysis techniques are used for custom marketing, say, for example, to promote a TV show or a movie Netflix releases (which might have various promos or trailers). If a viewer watches content that is more centered on women, the viewer will get a trailer of a movie that is more focused on female characters in that movie.

  22. Top 20 Analytics Case Studies in 2024

    Top 20 Analytics Case Studies in 2024. Cem Dilmegani. Analytics. Updated on Jan 10. 4 min read. Although the potential of Big Data and business intelligence are recognized by organizations, Gartner analyst Nick Heudecker says that the failure rate of analytics projects is close to 85%. Uncovering the power of analytics improves business ...

  23. Data Lifecycle Management in Precision Agriculture Supported by ...

    Among the different data lifecycle stages in Table 1, data analysis and AI stage encompassed the largest number of different technological categories, i.e., 22 subcategories. Being a focus point of 21 different solutions, computer vision was the most popular technological subcategory of data analysis and AI lifecycle stages.