How LinkedIn uses Hadoop to leverage Big Data Analytics?

Understand how LinkedIn leverages big data analytics to become the world’s largest professional social network, with more than 400 million members.

How LinkedIn uses Hadoop to leverage Big Data Analytics?

With more than 400 million profiles (122 million in US and 33 million in India) across 200+ countries, more than 100 million unique monthly visitors, 3 million company pages, 2 new members joining the network every second, 5.7 billion professional searches in 2012,7600 full-time employees, $780 million revenue as of Oct, 2015 and earnings of 78 cents per share (phew!)  - LinkedIn is the largest social network for professionals. People prefer to share their expertise and connect with like-minded professionals to discuss various issues of interest in a platform like LinkedIn, as it allows them to represent themselves formally in a less traditional manner. 2 or more people join LinkedIn’s professional network every second, making up the pool of 400 million members. They could be skilled professionals searching for a job or head-hunters looking for top talent.

big_data_project

Deploying auto-reply Twitter handle with Kafka, Spark and LSTM

Downloadable solution code | Explanatory videos | Tech Support

Table of Contents

The big data ecosystem at linkedin, 1) people you may know, 2) skill endorsements, 3) jobs you may be interested in, 4) news feed updates.

The Big Data Ecosystem at LinkedIn

Wondering how LinkedIn keeps up with your job preferences, your connection suggestions and stories you prefer to read? LinkedIn Big Data Analytics , is the success mantra that makes LinkedIn predict what kind of information you need to know and when you need it. At LinkedIn, big data is more about business than data. Here’s a case study exploring how LinkedIn uses its data goldmine to be a game changer in the professional network space.

ProjectPro Free Projects on Big Data and Data Science

“Our ultimate dream is to develop the world’s first economic graph", a sort of digital map of skills, workers and jobs across the global economy. Ambitions, in other words, that are a far cry from the industry’s early stabs at modernising the old-fashioned jobs board.”- said Jeff Weiner

LinkedIn is a huge social network platform not just in terms of revenue or members but also in terms of its multiple data products. LinkedIn processes thousands of events every day. It tracks each and every activity by the users. Big Data plays a vital role for data engineers, data analysts , data scientists and business experts that seek an in-depth understanding of various interactions happening in the social graph. Data scientists and analysts use big data to derive performance metrics and valuable business insights that lead to profitable decision making for marketing, sales and other functional areas.

LinkedIn uses data for its recommendation engine to build various data products. The data from user profiles and various network activities is used to build a comprehensive picture of a member and her connections. LinkedIn knows whom you should connect with, where you should apply for a job and how your skills stack up against your peers as you look for your dream job.

New Projects

LinkedIn Hadoop and Big Data Analytics

LinkedIn Big Data Analytics

Several technical accomplishments and contributions pepper LinkedIn’s hallmark 13-year journey as a pioneer in the professional networking space. Apache Hadoop forms an integral part of the technical environment at LinkedIn that powers some of the commonly used features on the mobile app and desktop site. As of May 6, 2013 –LinkedIn has a team of 407 Hadoop skilled employees. The biggest professional network consumes tons of data from multiple sources for analysis, in its Hadoop based data warehouses . The process of funnelling data into Hadoop systems is not as easy as it appears, because data has to be transferred from one location to a large centralized system. All the batch processing and analytics workload at LinkedIn is primarily handled by Hadoop. LinkedIn uses Hadoop for development of predictive analytics applications like “Skill Endorsements” and “People You May Know”, ad-hoc analysis by data scientists and for descriptive statistics for operating internal dashboards.

Get FREE Access to   Data Analytics Example Codes for Data Cleaning, Data Munging, and Data Visualization

Let’s take a look at the big data ecosystem at LinkedIn -

  • Azkaban (Workflow)
  • Data In- Apache Kafka
  • Data Out- Apache Kafka and Voldemort

Here’s a quick look at LinkedIn big data technologies, that are powered by Apache Hadoop –

  • Voldemort - A NoSQL distributed key value storage system, used at LinkedIn, for various critical services fuelling a large portion of the website.70% of all Hadoop data deployments at LinkedIn employ key-value access using Voldemort.
  • Decomposer - Contains large matrix decomposition algorithms implemented in Java.
  • White Elephant –Parses Hadoop logs and provides visualization dashboard that include number of slots used, count of failed jobs, total disk time and CPU time for different Hadoop clusters.
  • Giraph  – Used for social graph computations and interpretations on Hadoop clusters.
  • Avatar- LinkedIn’s scalable and highly available OLAP system used in “Who’s Viewed My Profile” feature. It serves queries in real-time.
  • Kafka - Publish-Subscribe messaging system, that unifies online and offline processing by providing a method for parallel load into Hadoop. Kafka at LinkedIn is used for tracking hundreds of different events like page views, profile views, network updates, impressions, logins and searches over a billion records every day.
  • Azkaban - Open source workflow system for Hadoop that provides make-like dependency analysis and cron-like scheduling.

Live servers are updated with large scale parallel fetch outcomes from Hadoop, into Voldemort that warms up the cache. After this, Voldemort introduces atomic switchover to the next day’s data on each server. An index structure in the Hadoop pipeline produces multi terabytes of lookup structure that uses hashing. This process helps obtain a balanced equilibrium between cluster computing of resource, to achieve faster responses. Hadoop is used to process huge batch workloads – it takes approximately 90 minutes to create a 900 GB data store on a Hadoop development cluster with 45 nodes. Hadoop clusters at LinkedIn are down for periodic maintenance and upgrades but their Voldemort servers are always up and running.

Explore Categories

LinkedIn Big Data Products

LinkedIn is injecting big data analytics into various features on its platform by building novel data products -

If you are a LinkedIn user, you probably know about the star feature of LinkedIn “People You May Know”. This feature reminds LinkedIn users with suggestion about other LinkedIn users they probably would be interested to connect with. “People You May Know” feature began with a huge Python script in 2006 and it started to drive immense growth on the platform since 2008.

Linkedin Data Products

Most of LinkedIn’s data is offline and it moves pretty slowly. LinkedIn’s data infrastructure uses Hadoop for batch processing. LinkedIn pre-computes the data for “People You May Know” product by recording close to 120 billion relationships per day in a Hadoop MapReduce pipeline, that runs 82 Hadoop jobs which require 16TB of intermediate data. The feature is implemented by a job that makes use of a statistical model to predict the probability of two persons knowing each other. The data infrastructure uses bloom filters to accelerate join operations while running jobs which provides 10 times better performance. There are 5 test algorithms continually running - producing approximately 700 GB of output data for the ‘People You May Know’ feature.

So, the next time when LinkedIn suggests someone that you have never expected to discover in the network, from a completely different part of your online life - do not worry! LinkedIn is tracking everything - right from your browser settings, log in details, in-mails you send and the profiles you view, to bring to you a list of people that you can connect with, who match your preferences.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Skill Endorsement is another interesting data product built by LinkedIn, that recruiters use, to look for the skills and expertise of a particular candidate. A member can endorse another member in their network, for a skill which is then showed on the endorsed person’s profile. Skills endorsement is a deep information extraction data problem.

Get More Practice,  More   Big Data and Analytics Projects , and More guidance.Fast-Track Your Career Transition with ProjectPro

The workflow first determines the various skills that exist for a member, which requires synonym detection and finding ambiguities if any. The skills are then joined with the profile of a member, social graph, groups and any other activity by the member that helps in finding out the skills for the person. After the skills are resolved, endorsement recommendations are computed by measuring the affinity between two members and the tendency for a member to have a particular skill. The resulting skill recommendations are delivered through Voldemort as key-value stores by mapping a member id to the list of other members, skill id’s and the score. The output is used by the front end team at LinkedIn to display it in a user friendly manner as shown below –

Skill Endorsements Data Products at LinkedIn

Searchable job titles, connections and skills are LinkedIn’s greatest possessions that employers can use when looking for top talent. LinkedIn is joining the dot for corporates by leverage big data for intelligent workforce planning through “Jobs You May Be Interested In” feature. 90% of Fortune 100 Companies use LinkedIn to hire top talent and 89% of professionals use LinkedIn to land a job. According to LinkedIn, 50% of the website engagement comes from “Jobs You May Be Interested In” feature. Machine Learning is plays a vital role in everything at LinkedIn whether it is Job Recommendations, Group Recommendations, News Story Recommendations, Personalization of the Social Feed or any personalized search.

Jobs You May Be Interested In

LinkedIn uses various Machine Learning and Text Analysis algorithms to show relevant jobs on a LinkedIn member’s profile. The textual content like skills, experience, and industry are extracted from a member’s profile. Similar features are extracted from the job listings available in LinkedIn. A logistic regression model is applied to know about the ranking of relevant jobs for a particular LinkedIn member based on the extracted features.

The machine learning algorithms that power the “Jobs You May Be Interested In” module, do not merely consider the city of residence and current field. There are multiple activities that are tracked before providing a job recommendation to a member. For instance, the ML algorithm analyses the migration patterns of a member. The machine learning algorithm for job recommendation at LinkedIn has determined that an employee in San Francisco will be more interested in a job opportunity in New York than to Fresno. The algorithms also tracks how often a member changes jobs. If a member if promoted quickly then the algorithm recommends jobs that are a step up for them.

Build an Awesome Job Winning Project Portfolio with Solved End-to-End Big Data Projects

LinkedIn incorporates data analytics and intelligence to understand what kind of information you’d like to read, what subjects interest you most, what kind of updates you like and putting together the aggregated real-time news feed for you. A LinkedIn member receives an update if any other member in their connections have an updated profile. For instance, to show deeper analytic insights like highlighting the company that most of the members/connections now work at, requires multiple join computations on different data sources which is time-consuming. As this is a batch compute intensive process that require joining company data of different member profiles Hadoop is used for rapid prototyping and testing new updates.

News Feed Updates on LinkedIn

With its data-drive strategy-LinkedIn continues to grow exponentially in terms of its revenue and member base through innovative data products. Let us know in comments if we have missed out any other important data product of LinkedIn that leverages analytics.

Recommended Reading: 

  • Data Analyst Responsibilities-What does a data analyst do?
  • Access Job Recommendation System Project with Source Code

Access Solved Big Data and Data Science Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Big Data Analyst LinkedIn Guide

Explore Big Data Analyst LinkedIn headlines, summary examples, and profile tips.

Getting Started as a Big Data Analyst

  • What is a Big Data Analyst
  • How to Become
  • Certifications
  • Tools & Software
  • LinkedIn Guide
  • Interview Questions
  • Work-Life Balance
  • Professional Goals
  • Resume Examples
  • Cover Letter Examples

Standing Out on LinkedIn as a Big Data Analyst

What to include in a big data analyst linkedin profile, headline and summary, experience and projects, skills and endorsements, recommendations and accomplishments, education and continuous learning, linkedin headline tips for big data analysts, big data analyst linkedin headline examples, why we like this:.

  • Specialization: Highlights a niche in predictive analytics and machine learning, key areas in big data.
  • Technical Expertise: Showcases technical skills that are in high demand for data-driven decision-making.
  • Value Proposition: Focuses on the ability to transform data into actionable business strategies.
  • Leadership Role: Indicates a senior position, suggesting experience and responsibility.
  • Core Skills Emphasis: Emphasizes expertise in data warehousing and ETL, foundational elements of big data analysis.
  • Business Impact: Connects technical skills to business outcomes, such as data integrity and intelligence.
  • Dual Expertise: Combines the roles of data scientist and analyst, showcasing a broad skill set.
  • Technical Proficiency: Calls out specific big data technologies, appealing to tech-savvy recruiters.
  • Communication Skills: The mention of data visualization indicates an ability to present complex data clearly.
  • Strategic Focus: Positions the individual as a strategist, not just a technician.
  • Industry Application: Specifies the retail sector, which could attract industry-specific opportunities.
  • Customer-Centric: Highlights the end goal of improving customer experience, aligning with business priorities.
  • Innovation Highlight: Stresses the use of AI, a cutting-edge field within big data analytics.
  • Automation Expertise: Automation is a key trend, and expertise here is highly marketable.
  • Educational Impact: Shows a commitment to spreading data literacy, which is valuable for team development and empowerment.

How to write a Big Data Analyst LinkedIn Summary

Articulate your analytical expertise, highlight impactful data projects and outcomes, share your data-driven journey, express your enthusiasm for big data analysis, write your linkedin summary with ai.

linkedin big data case study

Big Data Analyst LinkedIn Summary Examples

How to optimize your big data analyst linkedin profile, highlight your technical expertise and tools, showcase your analytical projects and impact, emphasize your problem-solving abilities, incorporate data visualization and case studies, engage with the big data community, collect endorsements and recommendations, linkedin faqs for big data analysts, how often should a big data analyst update their linkedin profile, what's the best way for a big data analyst to network on linkedin, what type of content should big data analysts post on linkedin to increase their visibility.

Big Data Analyst Interview Questions

linkedin big data case study

Related LinkedIn Guides

Transforming raw data into valuable insights, fueling business decisions and strategy

Designing data systems and blueprints for efficient information processing and flow

Navigating vast data landscapes, transforming complex information into actionable insights

Unearthing insights from data, driving strategic decisions with predictive analytics

Driving data-driven decisions, transforming raw data into actionable business insights

Unlocking business insights through data, driving strategic decisions with numbers

Start Your Big Data Analyst Career with Teal

Job Description Keywords for Resumes

AIM logo Black

  • Last updated February 8, 2024
  • In AI Origins & Evolution

Inside LinkedIn’s Big Data Pipelines

  • Published on August 1, 2021
  • by Avi Gopani

linkedin big data case study

LinkedIn is the largest professional and employment-oriented service platform. The company has been leveraging AI/ML to optimise various processes such as job postings, job recommendations and business insights. LinkedIn sees more than 210 million job applications submitted every month to the 57 million companies listed on the platform.  

LinkedIn’s Daily Executive Dashboard (DED) contains metrics on critical growth, engagement and bookings. It monitors and provides reports on important KPIs for business profiles, indicating the health of LinkedIn’s business. In addition, the LinkedIn system visualises more than 40 metrics across the business lines to provide company leaders with business insights promptly on their dashboards. 

The process begins with ingesting billions of records from online sources into HDFS. The Hadoop Distributed File System is designed to run on commodity hardware. The system manages data processing and storage for big data applications by providing high throughput access to application data. LinkedIn’s records are aggregated across more than 50 offline data flows, making its huge dataset applicable for Hadoop.  

To ensure business continuity, LinkedIn picked Teradata to meet the growing demands in batch processing. Big Data Engineering built and maintained the DWH’s data flows and datasets. LinkedIn’s data warehouse had grown to 1,400+ datasets, forcing the company’s hand to build data pipelines.

In 2020, Apache Kafka was processing 7 trillion incoming messages per day, making it critical to improve their data infrastructure. A core part of the LinkedIn architecture, the open sourced stream processing platform powers use cases like activity tracking, message exchanges and metric gathering. LinkedIn maintains over 100 Kafka clusters with more than 4,000 brokers, which serve more than 100,000 topics and 7 million partitions. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy, your newsletter subscriptions are subject to aim privacy policy and terms and conditions..

In 2019, LinkedIn released Dagli , an open-source machine learning library for Java. Dagli allows the user to write bug-resistant, modifiable, maintainable, and trivially deployable model pipelines without the technical debt. Based on highly multicore CPUs and powerful GPUs, the library has high efficacy for training real-world ML models.  

Dagli works on servers, Hadoop, command-line interfaces, IDEs, and other typical JVM contexts. 

Dagli’s features include: 

  • Single pipeline: The entire model pipeline is defined as a directed acyclic graph, eliminating the implementation of a pipeline for training and a separate pipeline for inference.
  • Ease of deployment: The entire pipeline is serialised and deserialised as a single object.
  • Batteries included: Plenty of pipeline components are ready to use including neural networks, logistic regression, gradient boosted decision trees, FastText, cross-validation, cross-training, feature selection, data readers, evaluation, and feature transformations.

Key internal developments to ensure efficient working of LinkedIn’s data pipelines include: 

  • Workflow automation: Azkaban, an open-sourced tool, automate jobs on Hadoop.
  • ETL (Extract Transform Load):  Brooklin, open-sourced and distributed system, enables streaming data between various source and destination systems and change-capture for Espresso and Oracle. Linkedin also uses Gobblin, a data integration framework to ingest internal and external data sources to Hadoop. This was used to compel the platform-specific ingestion framework to allow more operability and extensibility. 

Celia Kung, data pipeline manager at LinkedIn, talked about Brooklin at the 2019 QCon. Brooklin is a streaming data pipeline service to propagate data from various source systems to different destination systems. It can run several thousand streams at the same time within the same cluster. Additionally, each of these data streams can be individually configured and dynamically provisioned. Brooklyn allows the user to configure pipelines to enforce policies and manage data in one centralised location. 

“Every time you want to bring up a pipeline that moves data from A to B, you simply need to just make a call to our Brooklin service, and we will provide that dynamically, without needing to modify a bunch of config files, and then deploy to the cluster,” Kung explained . 

3. ETL/Ad-hoc querying languages

  • Spark SQL/Scala

4. Metric framework: LinkedIn uses Unified Metrics/Dimension Platforms, a common platform to develop, maintain, and manage metric datasets. It enhances the governance by allowing any engineer to build their metric dataset use case within the centralised platform. 

5. Reporting: Retina is LinkedIn’s internally developed reporting platform for data visualisation needs and supporting complex use cases, such as DED.

linkedin big data case study

Current architecture of the Daily Executive Dashboard pipeline

The team migrated DED datasets and flows to the new technologies and open systems such as Hadoop, Azkaban, and UMP/UDP enabled decentralised dataset development within LinkedIn. It allowed them to switch the one team bottleneck metric dataset development by consolidating the bloated DWH into fewer sources of truth datasets. This resulted in the team serving DED metrics using only one-third as many datasets, leading to greater leverage, data consistency and better governance. 

The team automated this process by sending the DED report from whichever cluster first completed the entire report generation and enabled an active-active set-up on the two production Hadoop clusters. In addition, through cross-cluster rollouts and a canary for testing changes, the team ensured the stability of DED amidst data flow. 

The team uses various strategies based on factors like underlying technology and infrastructure to develop the long-running flows. For instance, long-running data flow written in Pig migrated to Spark for a runtime improvement. The storage format of upstream data was converted from Avro to a columnar format for a 2.4X runtime improvement. 

Lastly, to prevent fluctuations in Hadoop clusters and prevent the nodes from bottlenecking on the system resources, the team added monitoring tools. These enable the platform and engineers to isolate system issues from performance issues. 

“The scope of this is far beyond just delivering a dashboard in our executives’ inboxes by 5:00 a.m., but rather codifying and pushing for best practices in ETL development and maintenance,” according to Jennifer Zheng. “By setting an example, other critical data pipelines have modelled after DED in high availability and disaster recovery through active-active and redundant set-ups.” 

Access all our open Survey & Awards Nomination forms in one place >>

Avi Gopani

Download our Mobile App

linkedin big data case study

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative ai skilling for enterprises, our customized corporate training program on generative ai provides a unique opportunity to empower, retain, and advance your talent., 3 ways to join our community, telegram group.

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox, recent stories.

Mustafa Suleyman Joins as the New CEO of Microsoft AI

Mustafa Suleyman Joins as the New CEO of Microsoft AI

Apple Antitrust

Apple Finally Unveils MM1, a Multimodal Model for Text and Image Data 

linkedin big data case study

Yotta Receives India’s First Cluster of 4,000 NVIDIA H100 GPUs

Java role in ML

Oracle Launches Java 22 and Confirms JavaOne 2025 Return

Oracle also announced the return of JavaOne to the San Francisco Bay Area in 2025.

linkedin big data case study

How NVIDIA’s Project GR00T is Accelerating Humanoid Robots 

linkedin big data case study

Stability AI Releases Stable Video 3D, Generating 3D Videos from Single Images

Stable Video 3D blurs the lines between 2D and 3D content creation.

Most Coding Will Be in Done Natural Language in 5 Years

In 5 Years, Coding will be Done in Natural Language

And the role of programmers is going to be very different.

Google Search is Killing the SEO Experience

Google Research Introduce PERL, a New Method to Improve RLHF

linkedin big data case study

Keras 3.1.0 Release Introduces Key Updates for Optimised AI Development

Our mission is to bring about better-informed and more conscious decisions about technology through authoritative, influential, and trustworthy journalism., shape the future of ai.

© Analytics India Magazine Pvt Ltd & AIM Media House LLC 2024

  • Terms of use
  • Privacy Policy

17 important case studies on Big Data analytics

Are you looking for some of good case studies that highlight how large companies leverage Big Data for driving productivity? Check out these 17 important case studies on Big Data. 23andMe 23andMe is a privately held personal genomics and biotechnology company. The company has developed its whole model around pulling insights from big data to give customers a 360-degree understanding of their genetic history. CBA Commonwealth Bank of Australia is using big data to analyse customer risk. Using analytics can get better risk assessment businesses, ongoing cash flow performance and early warning of risk challenges. Centers for Disease Control The Centers for Disease Control and Prevention (CDC) is the national public health institute of the United States. Its main aim is to protect people health and safety through the control and prevention of diseases. CDC had to rely on doctor reports of influenza outbreaks. CDC was weeks behind in providing vaccines to the affected patients. Using historical data from the CDC, Google compared search term queries against geographical areas that were known to have had flu outbreaks. Google then found forty five terms correlated with the outbreak of flu. With this data, CDC can act immediately. Delta Delta Air Lines, Inc. is a major American airline with an extensive domestic and international network. In general the top concern for an airlines would be passenger’s lost baggage. Delta looked further into their data and created a solution that would remove the uncertainty of where a passenger’s bag might be. Energy Future Holdings Corporation   Energy Future Holdings Corporation is an electric utility company. The majority of the company’s power generation is through coal- and nuclear-power plants. The company used Big data to install smart meters. The smart meters allows the provider to read the meter once every 15 minutes rather than one month. Google Google constantly develops new products and services that have big data algorithms. Google uses big data to refine its core search and ad-serving algorithms. Google describes that the self-driving car as a big data application. Kreditech Kreditech is a young tech company headquartered in Hafencity, Hamburg. The European company uses Big Data to create a unique credit score for consumers using more than 8000 sources. The analysis also lead to a surprise discovery of correlation between social media behaviour and financial stability. LinkedIn LinkedIn is a business-oriented social networking service. Founded in December 2002 and launched in 2003, it is mainly used for professional networking. LinkedIn uses big data to develop product offerings such as people you may know, jobs you may be interested in, who has viewed my profile and more. McLaren’s Formula One racing team McLaren Racing Limited is a British Formula One team. The racing car team uses real-time car sensor data during car races, identifies issues with its racing cars using predictive analytics and takes corrective actions pro-actively before it’s too late. Mint.com Mint.com is a free web-based personal financial management service for the US and Canada. Mint.com uses big data to provide users information about their spending by category and have a look where they spent their money in a given week, month or year. Singapore healthcare The healthcare providers in Singapore used analytics to better understand each patient’s condition, lifestyle choices, work and home environment. They can create personalized treatment plans tailored to that person’s individual behaviour. Sprint Sprint Corporation, is a United States telecommunications holding company that provides wireless services and is also a major global Internet carrier. It is the third largest U.S. wireless network operator as of 2014.Wireless carrier Sprint, uses smarter computing – primarily big data analytics to put real-time intelligence and control back into the network driving a 90% increase in capacity. T-Mobile USA The mobile operator has integrated Big Data across multiple IT systems to combine customer transaction and interactions data in order to better predict customer defections. By leveraging social media data (Big Data) along with transaction data from CRM and Billing systems, T-Mobile USA has been able to “cut customer defections in half in a single quarter”. UPS United Parcel Service of North America, Inc., referred as UPS , is one of the largest shipment and logistics companies in the world. The company tracks data on 16.3 million packages per day for 8.8 million customers, with an average of 39.5 million tracking requests from customers per day. The company stores over 16 petabytes of data. US Xpress US Xpress, provider of a wide variety of transportation solutions collects about a thousand data elements ranging from fuel usage to tire condition to truck engine operations to GPS information, and uses this data for optimal fleet management and to drive productivity saving millions of dollars in operating costs. Verizon Verizon uses big data to enhance mobile advertising. A unique identifier is created when the user registers in the website. The identifier allows advertiser to use information from the desktop computer. Marketing messages can be delivered to you mobile phone using this information. Woolworths Woolworths is the largest supermarket/grocery store chain in Australia. Woolworth uses business analytics to analyse customers’ shopping habit. The company nearly spent $20 million dollars to buy stakes in data Analytics Company. Nearly 1 billion is being spent on analysing consumer spending habits, and boosting online sales.

Recent Blogs

Why vector databases are key to enhanced AI and data analysis

Why vector databases are key to enhanced AI and data analysis

Artificial Intelligence , Industry , others

Are digital wallets a catalyst for customer engagement and revenue growth?

Are digital wallets a catalyst for customer engagement and revenue growth?

Industry , Industry Articles

Customers...

Natural Language Querying (NLQ): The future of search

Natural Language Querying (NLQ): The future of search

Artificial Intelligence , Business Intelligence , Industry

Everything...

25 Recommender Algorithms, Packages, Tools, and Frameworks: Your Gateway to Exploring Personalized Recommendations

25 Recommender Algorithms, Packages, Tools, and Frameworks: Your Gateway to Exploring Personalized Recommendations

Industry , Retail / eCom

Product recommendations for ecommerce brands

Product recommendations for ecommerce brands

Leveraging LLMs: Revolutionizing ecommerce and transforming customer experience

Leveraging LLMs: Revolutionizing ecommerce and transforming customer experience

Reimagining digital commerce with Open Protocols

Reimagining digital commerce with Open Protocols

How digital wallets are transforming the global payment landscape

How digital wallets are transforming the global payment landscape

Physical...

Game of Phones: Device financing for a digital-forward Africa

Game of Phones: Device financing for a digital-forward Africa

When the...

Accelerating Revenue with Recommendation-as-a-Service (RaaS): The Power of Personalized Experiences

Accelerating Revenue with Recommendation-as-a-Service (RaaS): The Power of Personalized Experiences

Artificial Intelligence , Industry

Loyalty Programs and Personalized Marketplaces: How to get the best of both worlds

Loyalty Programs and Personalized Marketplaces: How to get the best of both worlds

The synthetic data revolution

The synthetic data revolution

Pivoting to the future: where the digital world is heading today

Pivoting to the future: where the digital world is heading today

Vijaya Kumar Ivaturi on the technology renaissance

Vijaya Kumar Ivaturi on the technology renaissance

Leadership , Industry

Integrate deeply, so customers find it difficult to pull the plug: Suresh Shankar on Churn.FM podcast

Integrate deeply, so customers find it difficult to pull the plug: Suresh Shankar on Churn.FM podcast

Leadership , Industry , Press and Media

Crayon Data’s maya.ai features in Ecosystm Insights

Crayon Data’s maya.ai features in Ecosystm Insights

Press and Media , Industry

Vidhyashankar Sriram joins Crayon Data as Vice President – Client Solutions

Vidhyashankar Sriram joins Crayon Data as Vice President – Client Solutions

Leadership , Industry , People and Culture

Vidhyashan...

Email design trends: What’s in store for 2021?

Email design trends: What’s in store for 2021?

Infographics , Marketing

Technology to keep your home safe from holiday hazards

Technology to keep your home safe from holiday hazards

Infographics , Tech and Tools

Under the influence: 40% of americans have purchased something seen on social media

Under the influence: 40% of americans have purchased something seen on social media

Infographics , Marketing , Resources , Sectors

Combating employee theft in the workplace

Combating employee theft in the workplace

Infographics

Most common problems of going remote: How to deal with them

Most common problems of going remote: How to deal with them

Introducti...

The changing OTT landscape in a COVID-19 impacted world

The changing OTT landscape in a COVID-19 impacted world

5 tips on how to manage your finances during the COVID-19 pandemic

5 tips on how to manage your finances during the COVID-19 pandemic

Banking / Finance , Infographics

Rediscover life at your workplace: thrive in the post-crisis world

Rediscover life at your workplace: thrive in the post-crisis world

360° customer experience with personalization

360° customer experience with personalization

Subscribe to the crayon blog.

Get the latest posts in your inbox!

As the World Revolves We Evolve at JIMS

A Case Study on LinkedIn – Leveraging Big Data Analytics

linkedin big data case study

Ms. Palak Gupta Assistant Professor, JIMS, Kalkaji

Introduction Big Data refers to processing huge volumes of data that are beyond traditional processing of RDBMS. Data in today’s digital and tech world is highly diverse in variety, type, source, velocity, veracity etc. which needs real-time handling, pre-processing and summary analysis for accurate pattern recognition, association, correlation, regression, visualization, fraud detection and effective decisions. Today we are all surrounded with structured, semi- structured and unstructured data like operational day to day data, log files, CCTV footage, audio and video streaming, multimedia, chip and circuit generated data, sensors etc which are not only complex to handle but also to dig and mine out productive results. Organizations do intense refining, pre-processing, filtering and mining on big data for machine learning projects and other analytical applications.

Context to LinkedIn

LinkedIn is the world’s largest networking community for professionals in social media with more than 500 million profiles across 200+ countries. People prefer to create their LinkedIn profiles not only for showcasing their professional skills and achievements but majorly to connect with established corporate leaders and mangers for better insights, job opportunities, corporate news etc. LinkedIn is an apt platform where people can share their expertise and connect with similar domain and like-minded professionals for discussion and updates on various issues of interest, as it provides a semi-formal and traditional environment. People on LinkedIn vary from job-hunters, freshers to top professionals and each one of these categories avail quick service on LinkedIn by predicting what kind of information is needed, when and of which nature.

At LinkedIn, big data is more about business than data. LinkedIn heavily relies on Big Data Analytics for managing all profiles and their security and privacy along with providing relevant information to authorized users. LinkedIn is a large platform of social network not just in terms of revenue or members but also in terms of its multiple data products. LinkedIn processes thousands of events every day and does tracking of each user activity for better results and search queries. Big Data here plays a vital role in various interactions in social graph and so a number of big data designers, engineers, scientists and analysts are not only deployed but also networked. LinkedIn uses a lot of data for its recommender engine to create data products and build a comprehensive picture of a profile. Data scientists and analysts use big data to derive valuable business insights and performance metrics that lead to profitable decision making for sales, marketing and other functional areas. LinkedIn knows where you should apply for a job, whom you should connect with and how your skills stack up against your competitors and colleagues as you look for your dream job.

Application of Apache Hadoop in LinkedIn for leveraging Big Data technologies

Following are various big data techniques and tools deployed in LinkedIn: 1. Giraph-enables computations and interactions on social graph 2. Avatar- enables search of ‘who viewed My profile’ feature using OLAP3. Voldemort- provides NOSQL interface for distributed storage and processing 4. White Elephant-provides visualization dashboards giving every event summary 5. Kafka- used for tracking all LinkedIn events like page and profile views, updates, searches etc. 6. Azkaban-used for managing workflow system and scheduling jobs

linkedin big data case study

Data products used at LinkedIn

  • Skill endorsements- used by recruiters for extracting skills and expertise based right candidates. It is an information extraction data product of LinkedIn that uses big data technology.
  • News Feed Updates- provides relevant data through data analytics and machine learning algorithms of Hadoop for instant prototyping and new update testing.
  • People you may know-It is the best feature of LinkedIn to enable connections. Earlier this feature used Python code but now Hadoop batch processing to filter both online and offline data.
  • Join You May Be iNterested In- for looking top talent, this feature provides searchable job titles, skill sets and connections. It is used by 90% of Fortune 100 companies for hiring top talent and 89% of professionals to land a job. This feature provides 50% of website engagement.

Big Data Infrastructure used at LinkedIn 1. Apache Hadoop 2. Hadoop Distributed File System (HDFS) 3. Pig 4. Hive 5. Zookeeper 6. Azkaban 7. Kafka 8. Voldemort etc.

LinkedIn is one of the pioneer in professional networking and is heavily using technology, big data, machine learning and artificial intelligence for effective and accurate social profiling, connects and networking. Apache Hadoop powers commonly used features both on the website and mobile app. It stores, manages and analyses volumes of data from diverse sources stored in data warehouses and marts, providing distributed computations in real time and descriptive statistics for live and interactive dashboards and thus enabling ad-hoc analysis and other searches effective and result driven.

JIMS, Kalkaji organizes a number of workshops, seminars, webinars and guest lectures for PGDM  &  PGDM (IB) students by eminent industry leaders and experts to apprise them on AI, ML, Big Data and Analytics and motivate them for future learning. We also have Business Analytics Club, headed by Ms. Palak Gupta, Assistant Professor, JIMS, Kalkaji, that promotes these technologies in better learning, innovation and professional growth.

#jims #jimsdelhi #managementcollegeindelhi #pgdmcollegesindelhi #mbacollegesindelhi #toppgdmCollegesindelhi #topbschoolsindelhi #pgdmadmissions2022 #pgdm(ib)admissions2022 #bigdataanalytics #dataanalytics #Machinelearning #bigdataanalyticsinlinkedin

For more information visit:  https://www.jagannath.org/

' src=

Written by admin

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Recent Posts

  • Strategic Human Resource Management: Perspectives
  • Brand Management in the Digital Age: Strategies
  • Global Supply Chain Management: Strategies for Success
  • Innovative Teaching Methods in Management Education
  • The Rise of FinTech: Exploring Opportunities

Recent Comments

  • ishika on Navigating Career Success: Choosing the Right MBA College in Delhi NCR
  • Shivani on Navigating Career Success: Choosing the Right MBA College in Delhi NCR
  • nitesh on Navigating Career Success: Choosing the Right MBA College in Delhi NCR
  • Aashna Sharma on Navigating Career Success: Choosing the Right MBA College in Delhi NCR
  • smriti malhotra on Navigating Career Success: Choosing the Right MBA College in Delhi NCR
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • February 2022
  • January 2022
  • December 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • Digital Marketing
  • Economics & Finance
  • Entrepreneurship
  • Human Resource Management
  • Information technology
  • International Business
  • Marketing Management
  • Operations Management
  • Personal well being
  • Personal Well being
  • Entries feed
  • Comments feed
  • WordPress.org

8 case studies and real world examples of how Big Data has helped keep on top of competition

8 case studies and real world examples of how Big Data has helped keep on top of competition

Fast, data-informed decision-making can drive business success. Managing high customer expectations, navigating marketing challenges, and global competition – many organizations look to data analytics and business intelligence for a competitive advantage.

Using data to serve up personalized ads based on browsing history, providing contextual KPI data access for all employees and centralizing data from across the business into one digital ecosystem so processes can be more thoroughly reviewed are all examples of business intelligence.

Organizations invest in data science because it promises to bring competitive advantages.

Data is transforming into an actionable asset, and new tools are using that reality to move the needle with ML. As a result, organizations are on the brink of mobilizing data to not only predict the future but also to increase the likelihood of certain outcomes through prescriptive analytics.

Here are some case studies that show some ways BI is making a difference for companies around the world:

1) Starbucks:

With 90 million transactions a week in 25,000 stores worldwide the coffee giant is in many ways on the cutting edge of using big data and artificial intelligence to help direct marketing, sales and business decisions

Through its popular loyalty card program and mobile application, Starbucks owns individual purchase data from millions of customers. Using this information and BI tools, the company predicts purchases and sends individual offers of what customers will likely prefer via their app and email. This system draws existing customers into its stores more frequently and increases sales volumes.

The same intel that helps Starbucks suggest new products to try also helps the company send personalized offers and discounts that go far beyond a special birthday discount. Additionally, a customized email goes out to any customer who hasn’t visited a Starbucks recently with enticing offers—built from that individual’s purchase history—to re-engage them.

2) Netflix:

The online entertainment company’s 148 million subscribers give it a massive BI advantage.

Netflix has digitized its interactions with its 151 million subscribers. It collects data from each of its users and with the help of data analytics understands the behavior of subscribers and their watching patterns. It then leverages that information to recommend movies and TV shows customized as per the subscriber’s choice and preferences.

As per Netflix, around 80% of the viewer’s activity is triggered by personalized algorithmic recommendations. Where Netflix gains an edge over its peers is that by collecting different data points, it creates detailed profiles of its subscribers which helps them engage with them better.

The recommendation system of Netflix contributes to more than 80% of the content streamed by its subscribers which has helped Netflix earn a whopping one billion via customer retention. Due to this reason, Netflix doesn’t have to invest too much on advertising and marketing their shows. They precisely know an estimate of the people who would be interested in watching a show.

3) Coca-Cola:

Coca Cola is the world’s largest beverage company, with over 500 soft drink brands sold in more than 200 countries. Given the size of its operations, Coca Cola generates a substantial amount of data across its value chain – including sourcing, production, distribution, sales and customer feedback which they can leverage to drive successful business decisions.

Coca Cola has been investing extensively in research and development, especially in AI, to better leverage the mountain of data it collects from customers all around the world. This initiative has helped them better understand consumer trends in terms of price, flavors, packaging, and consumer’ preference for healthier options in certain regions.

With 35 million Twitter followers and a whopping 105 million Facebook fans, Coca-Cola benefits from its social media data. Using AI-powered image-recognition technology, they can track when photographs of its drinks are posted online. This data, paired with the power of BI, gives the company important insights into who is drinking their beverages, where they are and why they mention the brand online. The information helps serve consumers more targeted advertising, which is four times more likely than a regular ad to result in a click.

Coca Cola is increasingly betting on BI, data analytics and AI to drive its strategic business decisions. From its innovative free style fountain machine to finding new ways to engage with customers, Coca Cola is well-equipped to remain at the top of the competition in the future. In a new digital world that is increasingly dynamic, with changing customer behavior, Coca Cola is relying on Big Data to gain and maintain their competitive advantage.

4) American Express GBT

The American Express Global Business Travel company, popularly known as Amex GBT, is an American multinational travel and meetings programs management corporation which operates in over 120 countries and has over 14,000 employees.

Challenges:

Scalability – Creating a single portal for around 945 separate data files from internal and customer systems using the current BI tool would require over 6 months to complete. The earlier tool was used for internal purposes and scaling the solution to such a large population while keeping the costs optimum was a major challenge

Performance – Their existing system had limitations shifting to Cloud. The amount of time and manual effort required was immense

Data Governance – Maintaining user data security and privacy was of utmost importance for Amex GBT

The company was looking to protect and increase its market share by differentiating its core services and was seeking a resource to manage and drive their online travel program capabilities forward. Amex GBT decided to make a strategic investment in creating smart analytics around their booking software.

The solution equipped users to view their travel ROI by categorizing it into three categories cost, time and value. Each category has individual KPIs that are measured to evaluate the performance of a travel plan.

Reducing travel expenses by 30%

Time to Value – Initially it took a week for new users to be on-boarded onto the platform. With Premier Insights that time had now been reduced to a single day and the process had become much simpler and more effective.

Savings on Spends – The product notifies users of any available booking offers that can help them save on their expenditure. It recommends users of possible saving potential such as flight timings, date of the booking, date of travel, etc.

Adoption – Ease of use of the product, quick scale-up, real-time implementation of reports, and interactive dashboards of Premier Insights increased the global online adoption for Amex GBT

5) Airline Solutions Company: BI Accelerates Business Insights

Airline Solutions provides booking tools, revenue management, web, and mobile itinerary tools, as well as other technology, for airlines, hotels and other companies in the travel industry.

Challenge: The travel industry is remarkably dynamic and fast paced. And the airline solution provider’s clients needed advanced tools that could provide real-time data on customer behavior and actions.

They developed an enterprise travel data warehouse (ETDW) to hold its enormous amounts of data. The executive dashboards provide near real-time insights in user-friendly environments with a 360-degree overview of business health, reservations, operational performance and ticketing.

Results: The scalable infrastructure, graphic user interface, data aggregation and ability to work collaboratively have led to more revenue and increased client satisfaction.

6) A specialty US Retail Provider: Leveraging prescriptive analytics

Challenge/Objective: A specialty US Retail provider wanted to modernize its data platform which could help the business make real-time decisions while also leveraging prescriptive analytics. They wanted to discover true value of data being generated from its multiple systems and understand the patterns (both known and unknown) of sales, operations, and omni-channel retail performance.

We helped build a modern data solution that consolidated their data in a data lake and data warehouse, making it easier to extract the value in real-time. We integrated our solution with their OMS, CRM, Google Analytics, Salesforce, and inventory management system. The data was modeled in such a way that it could be fed into Machine Learning algorithms; so that we can leverage this easily in the future.

The customer had visibility into their data from day 1, which is something they had been wanting for some time. In addition to this, they were able to build more reports, dashboards, and charts to understand and interpret the data. In some cases, they were able to get real-time visibility and analysis on instore purchases based on geography!

7) Logistics startup with an objective to become the “Uber of the Trucking Sector” with the help of data analytics

Challenge: A startup specializing in analyzing vehicle and/or driver performance by collecting data from sensors within the vehicle (a.k.a. vehicle telemetry) and Order patterns with an objective to become the “Uber of the Trucking Sector”

Solution: We developed a customized backend of the client’s trucking platform so that they could monetize empty return trips of transporters by creating a marketplace for them. The approach used a combination of AWS Data Lake, AWS microservices, machine learning and analytics.

  • Reduced fuel costs
  • Optimized Reloads
  • More accurate driver / truck schedule planning
  • Smarter Routing
  • Fewer empty return trips
  • Deeper analysis of driver patterns, breaks, routes, etc.

8) Challenge/Objective: A niche segment customer competing against market behemoths looking to become a “Niche Segment Leader”

Solution: We developed a customized analytics platform that can ingest CRM, OMS, Ecommerce, and Inventory data and produce real time and batch driven analytics and AI platform. The approach used a combination of AWS microservices, machine learning and analytics.

  • Reduce Customer Churn
  • Optimized Order Fulfillment
  • More accurate demand schedule planning
  • Improve Product Recommendation
  • Improved Last Mile Delivery

How can we help you harness the power of data?

At Systems Plus our BI and analytics specialists help you leverage data to understand trends and derive insights by streamlining the searching, merging, and querying of data. From improving your CX and employee performance to predicting new revenue streams, our BI and analytics expertise helps you make data-driven decisions for saving costs and taking your growth to the next level.

Most Popular Blogs

linkedin big data case study

Elevating User Transitions: JML Automation Mastery at Work, Saving Hundreds of Manual Hours

Smooth transition – navigating a seamless servicenow® upgrade, seamless integration excellence: integrating products and platforms with servicenow®.

TechEnablers-ep4

TechEnablers Episode 4: Transforming IT Service Managem

TechEnablers-Nitesh

TechEnablers Episode 3: Unlocking Efficiency: Accelerat

TechEnablers-Asmita

TechEnablers Episode 2: POS Transformation: Achieving S

P14

Navigating the Future: Global Innovation, Technology, a

Podcast-ep13-banner

Revolutionizing Retail Supply Chains by Spearheading Di

linkedin big data case study

Navigating the Digital Transformation Journey in the Mo

linkedin big data case study

AWS Named as a Leader for the 11th Consecutive Year…

Introducing amazon route 53 application recovery controller, amazon sagemaker named as the outright leader in enterprise mlops….

  • Made To Order
  • Cloud Solutions
  • Salesforce Commerce Cloud
  • Distributed Agile
  • IT Strategy & Consulting
  • Data Warehouse & BI
  • Security Assessment & Mitigation
  • Case Studies

Quick Links

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • AMIA Annu Symp Proc
  • v.2017; 2017

Big data in healthcare – the promises, challenges and opportunities from a research perspective: A case study with a model database

Mohammad adibuzzaman.

1 Regenstrief Center for Healthcare Engineering, Purdue University, West Lafayette, Indiana, USA

Poching DeLaurentis

Jennifer hill, brian d. benneyworth.

2 Children’s Health Services Research Group, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, USA

Recent advances in data collection during routine health care in the form of Electronic Health Records (EHR), medical device data (e.g., infusion pump informatics, physiological monitoring data, and insurance claims data, among others, as well as biological and experimental data, have created tremendous opportunities for biological discoveries for clinical application. However, even with all the advancement in technologies and their promises for discoveries, very few research findings have been translated to clinical knowledge, or more importantly, to clinical practice. In this paper, we identify and present the initial work addressing the relevant challenges in three broad categories: data, accessibility, and translation. These issues are discussed in the context of a widely used detailed database from an intensive care unit, Medical Information Mart for Intensive Care (MIMIC III) database.

1. Introduction

The promise of big data has brought great hope in health care research for drug discovery, treatment innovation, personalized medicine, and optimal patient care that can reduce cost and improve patient outcomes. Billions of dollars have been invested to capture large amounts of data outlined in big initiatives that are often isolated. The National Institutes of Health (NIH) recently announced the All of Us initiative, previously known as the Precision Medicine Cohort Program , which aims to collect one million or more patients’ data such as EHR, genomic, imaging, socio-behavioral, and environmental data over the next few years 1 . The Continuously Learning Healthcare System is also being advocated by the Institute of Medicine to close the gap between scientific discovery, patient and clinician engagement, and clinical practice 2 . However, the big data promise has not yet been realized to its potential as the mere availability of the data does not translate into knowledge or clinical practice. Moreover, due to the variation in data complexity and structures, unavailability of computational technologies, and concerns of sharing private patient data, few projects of large clinical data sets are made available to researchers in general. We have identified several key issues in facilitating and accelerating data driven translational clinical research and clinical practice. We will discuss in-depth in the domains of data quality, accessibility, and translation. Several use cases will be used to demonstrate the issues with the “Medical Information Mart for Intensive Care (MIMIC III)” database, one of the very few databases with granular and continuously monitored data of thousands of patients 3 .

2. Promises

In the era of genomics, the volume of data being captured from biological experiments and routine health care procedures is growing at an unprecedented pace 4 . This data trove has brought new promises for discovery in health care research and breakthrough treatments as well as new challenges in technology, management, and dissemination of knowledge. Multiple initiatives were taken to build specific systems in addressing the need for analysis of different types of data, e.g., integrated electronic health record (EHR) 5 , genomics-EHR 6 , genomics-connectomes 7 , insurance claims data, etc. These big data systems have shown potential for making fundamental changes in care delivery and discovery of treatments such as reducing health care costs, reducing number of hospital re-admissions, targeted interventions for reducing emergency department (ED) visits, triage of patients in ED, preventing adverse drug effects, and many more 8 . However, to realize these promises, the health care community must overcome some core technological and organizational challenges.

3. Challenges

Big data is not as big as it seems.

In the previous decade, federal funding agencies and private enterprises have taken initiatives for large scale data collection during routine health care and experimental research 5 , 9 . One prominent example of data collection during routine health care is the Medical Information Mart for Intensive Care (MIMIC III) which has collected data for more than fifty thousand patients from Beth Israel Deaconess Hospital dating back to 2001 3 . This is the largest publicly available patient care data set of an intensive care unit (ICU) and an important resource for clinical research. However, when it comes to identifying a cohort in the MIMIC data for answering a specific clinical question, it often results in a very small set of cases (small cohort) that makes it almost impossible to answer the question with a strong statistical confidence. For example, when studying the adverse effects of a drug-drug interaction, a researcher might be interested in looking at the vital signs and other patient characteristics during the time two different drugs were administered simultaneously, including a few days before the combination and a few days after the combination. Often this selection criteria results in a very small cohort of patients limiting the interpretation of the finding and with statistically inconclusive results. As an example, a researcher may want to investigate if any adverse effect exists when anti-depressants and anti-histamines are administered simultaneously. A query of simultaneous prescriptions of Amitriptyline HCl (anti-depressant) and Diphenhydramine HCl (anti-histamines) returned only 44 subjects in the MIMIC database ( Figure 1 ). Furthermore, by filtering the data with another selection criterion (e.g., to identify the subjects for which at least one day’s worth of data exist during, before and after the overlap) the query returned a much smaller cohort with only four records.

An external file that holds a picture, illustration, etc.
Object name is 2731403f1.jpg

Example of a small cohort with clinical selection criteria.

Data do not fully capture temporal and process information

In most cases, clinical data are captured in various systems, even within an organization, each with a somewhat different intent and often not well integrated. For example, an EHR is primarily used for documenting patient care and was designed to facilitate insurance company billing 10 , and pharmacy records were designed for inventory management. These systems were not developed to capture the temporal and process information which is indispensable for understanding disease progression, therapeutic effectiveness and patient outcomes. In an attempt to study clinical process of vancomycin therapeutic drug monitoring based on ICU patient records in the MIMIC database, it was discovered that such process is not easy to reconstruct. Ideally, a complete therapeutic process with a particular drug contains the history of the drug’s prescription, each of its exact administration times, amount and rate, and the timing and measurements of the drug in the blood throughout the therapy. From the MIMIC III database we were able to find prescription information but it lacks the detailed dosing amount and prescription’s length of validity. The “inputevents” table contains drug administration information but does not include the exact time-stamp and drug amount which is critical for studying intravenous infused vancomycin in the ICU. It is also difficult to match drug prescription and administration records because their recording times in the clinical systems often are not the precise event times, and prescribed drugs are not always administered.

Moreover, since the MIMIC III database does not contain detailed infusion event records which may be available from infusion pump software, one cannot know the precise drug infusion amount (and over what time) for any particular administration. The sparse and insufficient information on drug administration makes it almost impossible to associate available laboratory records and to reconstruct a therapeutic process for outcomes studies. Figure 2 is such an attempt of process reconstruction using data from the MIMIC III database including prescriptions, input events, and lab events for one patient during a unique ICU stay. The record only shows one valid prescription of vancomycin for this patient with start and end dates but does not indicate the administration frequency (e.g., every 12 hours) or method (e.g., continuous or bolus). The input events data (the second main column) came from the nursing records but it only shows one dose of vancomycin administration on each of the three-day ICU stay: one in the morning and two in the evening. Even though, as shown in the third main column, the “lab event” data contain the patient’s vancomycin concentration levels measured during this period, without the exact amount and duration of each vancomycin infusion, it is difficult to reconstruct this particular therapeutic process for the purposes of understanding its real effectiveness.

An external file that holds a picture, illustration, etc.
Object name is 2731403f2.jpg

An example of vancomycin therapeutic process reconstruction of one unique ICU stay using data from three different tables in the MIMIC III database.

The problem of missing data remains relevant, even when the nursing workflow was designed to capture the data in the EHR. For example, as part of the nursing workflow, the information of drug administration should be documented in the medication administration records each time vancomycin was administered, and the MIMIC system was designed to capture all. But this was often not the case from our review of the database 2. Additionally, often times a patient’s diagnoses, co-morbidities, and complications are not fully captured nor available for reconstructing the complete clinical encounter. Those pieces of information are usually documented as free text not discrete data that can easily be extracted. Moreover, precise timings of the onset of an event and its resolution are rarely present. In the previous example of analyzing the effect of simultaneously administering Amitriptyline HCl and Diphenhydramine HCl, based on our selection criteria, we were able to find only one or two cases where such data were recorded ( Figure 3 ). In the figure, each color represents one subject, and only one color (green, ID:13852) is consistently present in the time window for the selection criteria indicating missing systolic blood pressure measurements for the other subjects. This example is not an exception for cohort selection from data captured during care delivery, but a common occurrence 11 , due to the complex nature of the care delivery process and technological barriers in the various clinical systems developed in the past decade or so.

An external file that holds a picture, illustration, etc.
Object name is 2731403f3.jpg

Example of a cohort with missing systolic blood pressure data for three out of the four subjects meeting our clinical selection criteria. Day 0 (zero) is when drug overlap begins. This start of the overlap is aligned with multiple subjects and is denoted by the thick red line. Each data point represents one measurement from the “chartevents” table and each color indicates one subject and the black line indicates the average of the selected cohort.

3.2. Access

Accessibility to patient data for scientific research and sharing of the scientific work as digital objects for validation and reproducibility is another challenging domain due to patient privacy concerns, technological issues such as interoperability, and data ownership confusion. This has been a widely discussed issue in recent years of the so-called patient or health data conundrum as individuals do not have easy access to their own data 12 . We are discussing these challenges in the context of privacy, share-ability, and proprietary rights as follows.

Access to health care data is plagued by vulnerability due to patient privacy considerations which are protected by federal and local laws of protected health information such as Health Insurance Portability and Accountability Act of 1996 (HIPAA) 13 . The fear of litigation and breach of privacy discourages providers from sharing patient health data, even when they are de-identified. One reason is that current approaches to protect private information is limited to de-identification of an individual subject with an ID, which is vulnerable to twenty questions-like problems. For example, a query to find any patient who is of Indian origin and has some specific cancer diagnosis with a residential zip code 3-digit prefix ‘479’ may result in only one subject; thus exposing the identity of the individual.

Share-ability

Even after de-identification of patient data, the sharing of such data and research works based on the data is a complicated process. As an example, “Informatics for Integrating Biology and the Bedside (i2b2)” 5 is a system designed to capture data for scientific research during routine health care. i2b2 is a large initiative undertaken by Partners Healthcare System as an NIH-funded National Center for Biomedical Computing (NCBC). It contains a collection of data systems with over 100 hospitals that are using this software system on top of their clinical database. As a member of this project, each participating hospital system needed to transform their data into a SQL based star schema after de-identification. It required much effort for each institution to make the data available for scientific research as well as to develop the software in the first place. Although i2b2 was used exhaustively for research, sharing of data and research work as digital objects (i.e., the coding and the flow of the analysis) is not easily achieved. We argue that current EHR and other clinical systems do not empower the patients to take control of their data and engage in citizen science. The crowd sourcing approach might be one way to make a paradigm shift in this area which, unfortunately, is not yet possible with the current systems such as i2b2. A good example is the success in open source software technologies in other disciplines and applications (such as Linux, Git-hub, etc.) which rely on the engagement of many talented and passionate scientists and engineers all over the world to contribute their working products as digital objects 14 .

Proprietary rights

A relevant issue is the ongoing debate about the ownership of patient data among various stakeholders in the healthcare system including providers, patients, insurance companies and software vendors. In general, the current model is such that the patient owns his/her data, and the provider stores the data with proprietary software systems. The business models of most traditional EHR companies, such as Epic and Cerner, are based on building proprietary software systems to manage the data for insurance reimbursement and care delivery purposes. Such approach does not encourage or makes it difficult for individual patients to share data for scientific research, nor does it encourage patients to obtain their own health records that may help better manage their care and improve patient engagement.

3.3. Translation

Historically, a change in clinical practice is hard to achieve because of the sensitivity and risk aversion of care delivery. As an example, the use of beta blockers to prevent heart failure took 25 years to reach a widespread clinical adoption after the first research results were published 2 . This problem is much bigger for big data driven research findings to be translated into clinical practice because of the poor understanding of the risks and benefits of data driven decision support systems. Many machine learning algorithms work as a “black box” with no provision of good interpretations and clinical context of the outcomes, even though they often perform with reasonable accuracy. Without proper understanding and translatable mechanisms, it is difficult to estimate the risk and benefit of such algorithms in the clinical setting and thus discourages the new methods and treatments from being adopted by clinicians or approved by the regulatory bodies such as the FDA.

For example, if a machine learning algorithm can predict circulatory shock from patient arterial blood pressure data, what would be the risk if the algorithm fails in a particular setting based on patient demographics or clinical history? What should be the sample size to achieve high confidence in the results generated by the algorithm? These are some critical questions that cannot be answered by those traditional “black box” algorithms, nor have they been well accepted by the medical community, which relies heavily upon rule based approaches.

As an example, a decision tree algorithm might perform very differently for prediction of Medical Emergency Team (MET) activation based on the training set or sample size from the MIMIC data. Furthermore, the prediction result can be very different when another machine learning algorithm, the support vector machine (SVM), was used ( Figure 4 ).

An external file that holds a picture, illustration, etc.
Object name is 2731403f4.jpg

Sensitivity for the machine learning algorithms for different training sizes for prediction of Medical Emergency Team (MET) activation from the MIMIC database 15 . The X-axis represents training size for different trials. For each training set, the results of a 10 fold cross validation are reported as box plots (the central red line is the median, the edges of the box are the 25 th and 75 th percentiles, the whiskers extend to the extreme data points the algorithm considers to be not outliers, and the red + sign denotes outliers). The blue asterisks represent the performance on the validation set of the algorithm that performs best on the test set. The blue dashed lines represent the performance of the National Early Warning Score (NEWS) 16 .

3.4. Incentive

Yet another barrier in using big data for better health is the lack of incentive for organizations to take initiative to address the technological challenges. As mentioned earlier, EHRs are developed for purposes other than knowledge advancement or care quality improvement, and that has led to unorganized, missing, and inadequate data for clinical research. An individual health system does not usually have the incentive to make these data organized and available for research, unless they are big academic institutions. It would be easier for each individual health system to share data if they were organized and captured using standard nomenclature and with meaningful and useful detailed information with significant detail. A key question any health organization faces is: what is the return on investment for my hospital to organize all the clinical data it gathers? One model is the Health Information Technology for Economic and Clinical Health Act (HITECH) which promotes the adoption and meaningful use of health information technology. The act authorized incentive payments be made through Medicare and Medicaid to clinicians and hospitals that adopted and demonstrated meaningful use of EHRs, and the US government has committed payments up to $27 billion dollars over a ten year period 17 . This incentive has paved the way for widespread adoption of EHRs since HITECH was enacted as part of the American Recovery and Reinvestment Act in 2009. However, for the purpose of using clinical data for scientific innovation and improving care delivery process, no apparent financial incentives currently exist for any organization to do so.

4. Opportunities

For data driven research in health care, we propose to record the most granular data during any care delivery process so as to capture the temporal and process information for treatment and outcomes. For example, in an intensive care unit, the exact time of medication administrations need to be captured. This can be achieved in a number of ways. As a nurse bar code scans an oral medication into the electronic medication administration record (eMAR) the system also timestamps the action in the EHR. Detailed intravenous drug infusions can be linked to the patient clinical records by integrating the smart infusion pumps with the EHR systems. The Regenstrief National Center for Medical Device Informatics (REMEDI), formerly known as the Infusion Pump Informatics 18 , has been capturing for capturing process and temporal infusion information. The planned expansion of such data set will allow linked patient outcomes and drug admin data forming a more complete treatment process for answering research and treatment effectiveness questions related to the administration of drugs such as drug-drug interaction, safe and effective dosage of drugs, etc., among others.

In order to achieve a statistically significant sample size after cohort selection, we promote breaking the silos of individual clinical data systems and making them interoperable across vendors, types and institutional boundaries with minimal effort. For the next generation of EHRs, these capabilities need to be considered.

4.2. Access

Patient/citizen powered research

To replicate the success in open source technologies in other disciplines by enabling citizen science, data and research analysis must be accessible to everyone. At the same time, patient privacy needs to be protected complying with the privacy law and proprietary rights of the vendors, and researchers need to be protected. As an example, we have demonstrated such a system with the MIMIC database where interoperable and extensible database technologies have been used on de-identified patient data in a high performance computing environment 19 .

Shareable digital objects

For the next generation of EHRs and other big data systems such as REMEDI 18 and i2b2 5 , data must be findable, accessible, interoperable and reproducible (FAIR) 20 . For big data systems, a software-hardware ecosystem could work as a distribution platform with characteristics analogous to an Apple or Android “app store” where any qualified individual can access the de-identified data with proper authentication without the need for a high throughput infrastructure and the rigorous work, including pre-processing of the data needed to reproduce previous works. The proposed architecture is shown in Figure 5 19 .

An external file that holds a picture, illustration, etc.
Object name is 2731403f5.jpg

System concept for community driven software-hardware eco-system analogous to ‘app store’ for data driven clinical research.

4.3. Translation Causal understanding

Historically, clinical problems and treatment are studied and understood as “cause and effect”. For example, genetic disposition and lifestyle could lead to frequent urination, fatigue and hunger, and can be associated with diabetes. Based on this, the patient may be treated for this disease. However, most machine learning algorithms do not provide such a rule based approach; rather they predict the outcome of a given set of inputs, which may or may not be associated with known clinical understanding. Unlike other disciplines, clinical applications require a causal understanding of data driven research. Hence, most clinical studies start with some hypothesis, that ‘A’ causes ‘B’. The gold standard to identify this causation is randomized controlled trials (RCTs), which have also been the gold standard for regulatory approval of new drugs. Unfortunately, EHRs and the like data captured during routine healthcare has sampling selection bias and confounding variables and hence it is important to understand the limitation of such data sets. To answer the causal questions, a new generation of methods are necessary to understand the causal flow of treatment, outcome, and molecular properties of drugs by integrating big data systems for analysis and validation of hypothesis for transportability across studies with observational data 21 , 22 . These methods would enable the regulators to understand the risk and benefit of data driven systems in clinical settings for new guidelines enabling the translation. Once those guidelines are established, technological solution must also be enabled at the point of care such that clinicians can access for data driven queries as part of their clinical workflow.

5. Conclusion

“Big data” started with many believable promises in health care, but unfortunately, clinical science is different from other disciplines with additional constraints of data quality, privacy, and regulatory policies. We discussed these concepts in pursuit of a holistic solution that enables data driven findings to be translated in health care, from bench to bedside. We argue that the existing big data systems are still in their infancy, and without addressing these fundamental issues the health care big data may not achieve its full potential. We conclude that to make it to the next level, we need a larger cohort of institutions to share more complete, precise, and time stamped data as well as with greater willingness to invest in technologies for de-identifying private patient data for it to be shared broadly for scientific research. At the same time, as more and more “big data” systems are developed, the scientific and regulatory communities need to figure out new ways of understanding causal relationship from data captured during routine health care, that would complement current gold standard methods such as RCTs as well as identify the relationship between clinical practice and outcomes, as there is a wide disparity in the quality of care across the country 2 .

linkedin big data case study

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

  • 14 min read
  • Business ,   Data Science ,   Cloud
  • 14 Mar, 2023
  • 2 Comments Share

What is a modern data stack?

Modern data stack architecture.

Modern data stack architecture.

Modern data stack vs traditional data stack

Components of a modern data stack architecture, data sources.

Data sources component in a modern data stack.

Data sources component in a modern data stack.

Data integration

Data integration component in a modern data stack.

Data integration component in a modern data stack.

Data storage

Data storage component in a modern data stack.

Data storage component in a modern data stack.

Data transformation

Data transformation component in a modern data stack.

Data transformation component in a modern data stack.

  • Cleaning: removing or correcting inaccurate, incomplete, or irrelevant data in the dataset.
  • Normalizing: organizing the data in a standard format to eliminate redundancy and ensure consistency.
  • Filtering: selecting a subset of data based on certain criteria or conditions.
  • Joining: combining data from multiple sources based on a common key or attribute.
  • Modeling: transforming the data into a format that is suitable for analysis, including creating data structures, aggregating data, and adding derived fields.
  • Summarizing: creating condensed and simplified views of the data, often through aggregation or grouping.

Data use component in a modern data stack.

Data use component in a modern data stack.

Data governance, orchestration, and monitoring

The data governance, orchestration, and monitoring component in a modern data stack.

The data governance, orchestration, and monitoring component in a modern data stack.

Data versioning

Data versioning component in a modern data stack.

Data versioning component in a modern data stack.

  • track changes to data at different stages of the data pipeline;
  • compare different versions of data to identify changes and discrepancies;
  • roll back to previous versions of data if needed;
  • collaborate on data changes with team members through version control; and
  • monitor and audit changes made to data to ensure compliance with data governance policies.

Modern data stack implementation use cases

A simplified diagram shows the major components of Airbnb's data infrastructure stack. Source: Medium

A simplified diagram shows the major components of Airbnb's data infrastructure stack. Source: The Airbnb Tech Blog on Medium

linkedin big data case study

The third generation of Uber's Big Data platform. Source: Uber

How to build a modern data stack that will fit your business needs

Define your data needs and goals, build a strong team of data professionals, choose the right components for your organization, ensure data governance and security, establish a data culture within your organization, regularly review and update your data stack.

A generative AI reset: Rewiring to turn potential into value in 2024

It’s time for a generative AI (gen AI) reset. The initial enthusiasm and flurry of activity in 2023 is giving way to second thoughts and recalibrations as companies realize that capturing gen AI’s enormous potential value is harder than expected .

With 2024 shaping up to be the year for gen AI to prove its value, companies should keep in mind the hard lessons learned with digital and AI transformations: competitive advantage comes from building organizational and technological capabilities to broadly innovate, deploy, and improve solutions at scale—in effect, rewiring the business  for distributed digital and AI innovation.

About QuantumBlack, AI by McKinsey

QuantumBlack, McKinsey’s AI arm, helps companies transform using the power of technology, technical expertise, and industry experts. With thousands of practitioners at QuantumBlack (data engineers, data scientists, product managers, designers, and software engineers) and McKinsey (industry and domain experts), we are working to solve the world’s most important AI challenges. QuantumBlack Labs is our center of technology development and client innovation, which has been driving cutting-edge advancements and developments in AI through locations across the globe.

Companies looking to score early wins with gen AI should move quickly. But those hoping that gen AI offers a shortcut past the tough—and necessary—organizational surgery are likely to meet with disappointing results. Launching pilots is (relatively) easy; getting pilots to scale and create meaningful value is hard because they require a broad set of changes to the way work actually gets done.

Let’s briefly look at what this has meant for one Pacific region telecommunications company. The company hired a chief data and AI officer with a mandate to “enable the organization to create value with data and AI.” The chief data and AI officer worked with the business to develop the strategic vision and implement the road map for the use cases. After a scan of domains (that is, customer journeys or functions) and use case opportunities across the enterprise, leadership prioritized the home-servicing/maintenance domain to pilot and then scale as part of a larger sequencing of initiatives. They targeted, in particular, the development of a gen AI tool to help dispatchers and service operators better predict the types of calls and parts needed when servicing homes.

Leadership put in place cross-functional product teams with shared objectives and incentives to build the gen AI tool. As part of an effort to upskill the entire enterprise to better work with data and gen AI tools, they also set up a data and AI academy, which the dispatchers and service operators enrolled in as part of their training. To provide the technology and data underpinnings for gen AI, the chief data and AI officer also selected a large language model (LLM) and cloud provider that could meet the needs of the domain as well as serve other parts of the enterprise. The chief data and AI officer also oversaw the implementation of a data architecture so that the clean and reliable data (including service histories and inventory databases) needed to build the gen AI tool could be delivered quickly and responsibly.

Never just tech

Creating value beyond the hype

Let’s deliver on the promise of technology from strategy to scale.

Our book Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI (Wiley, June 2023) provides a detailed manual on the six capabilities needed to deliver the kind of broad change that harnesses digital and AI technology. In this article, we will explore how to extend each of those capabilities to implement a successful gen AI program at scale. While recognizing that these are still early days and that there is much more to learn, our experience has shown that breaking open the gen AI opportunity requires companies to rewire how they work in the following ways.

Figure out where gen AI copilots can give you a real competitive advantage

The broad excitement around gen AI and its relative ease of use has led to a burst of experimentation across organizations. Most of these initiatives, however, won’t generate a competitive advantage. One bank, for example, bought tens of thousands of GitHub Copilot licenses, but since it didn’t have a clear sense of how to work with the technology, progress was slow. Another unfocused effort we often see is when companies move to incorporate gen AI into their customer service capabilities. Customer service is a commodity capability, not part of the core business, for most companies. While gen AI might help with productivity in such cases, it won’t create a competitive advantage.

To create competitive advantage, companies should first understand the difference between being a “taker” (a user of available tools, often via APIs and subscription services), a “shaper” (an integrator of available models with proprietary data), and a “maker” (a builder of LLMs). For now, the maker approach is too expensive for most companies, so the sweet spot for businesses is implementing a taker model for productivity improvements while building shaper applications for competitive advantage.

Much of gen AI’s near-term value is closely tied to its ability to help people do their current jobs better. In this way, gen AI tools act as copilots that work side by side with an employee, creating an initial block of code that a developer can adapt, for example, or drafting a requisition order for a new part that a maintenance worker in the field can review and submit (see sidebar “Copilot examples across three generative AI archetypes”). This means companies should be focusing on where copilot technology can have the biggest impact on their priority programs.

Copilot examples across three generative AI archetypes

  • “Taker” copilots help real estate customers sift through property options and find the most promising one, write code for a developer, and summarize investor transcripts.
  • “Shaper” copilots provide recommendations to sales reps for upselling customers by connecting generative AI tools to customer relationship management systems, financial systems, and customer behavior histories; create virtual assistants to personalize treatments for patients; and recommend solutions for maintenance workers based on historical data.
  • “Maker” copilots are foundation models that lab scientists at pharmaceutical companies can use to find and test new and better drugs more quickly.

Some industrial companies, for example, have identified maintenance as a critical domain for their business. Reviewing maintenance reports and spending time with workers on the front lines can help determine where a gen AI copilot could make a big difference, such as in identifying issues with equipment failures quickly and early on. A gen AI copilot can also help identify root causes of truck breakdowns and recommend resolutions much more quickly than usual, as well as act as an ongoing source for best practices or standard operating procedures.

The challenge with copilots is figuring out how to generate revenue from increased productivity. In the case of customer service centers, for example, companies can stop recruiting new agents and use attrition to potentially achieve real financial gains. Defining the plans for how to generate revenue from the increased productivity up front, therefore, is crucial to capturing the value.

Upskill the talent you have but be clear about the gen-AI-specific skills you need

By now, most companies have a decent understanding of the technical gen AI skills they need, such as model fine-tuning, vector database administration, prompt engineering, and context engineering. In many cases, these are skills that you can train your existing workforce to develop. Those with existing AI and machine learning (ML) capabilities have a strong head start. Data engineers, for example, can learn multimodal processing and vector database management, MLOps (ML operations) engineers can extend their skills to LLMOps (LLM operations), and data scientists can develop prompt engineering, bias detection, and fine-tuning skills.

A sample of new generative AI skills needed

The following are examples of new skills needed for the successful deployment of generative AI tools:

  • data scientist:
  • prompt engineering
  • in-context learning
  • bias detection
  • pattern identification
  • reinforcement learning from human feedback
  • hyperparameter/large language model fine-tuning; transfer learning
  • data engineer:
  • data wrangling and data warehousing
  • data pipeline construction
  • multimodal processing
  • vector database management

The learning process can take two to three months to get to a decent level of competence because of the complexities in learning what various LLMs can and can’t do and how best to use them. The coders need to gain experience building software, testing, and validating answers, for example. It took one financial-services company three months to train its best data scientists to a high level of competence. While courses and documentation are available—many LLM providers have boot camps for developers—we have found that the most effective way to build capabilities at scale is through apprenticeship, training people to then train others, and building communities of practitioners. Rotating experts through teams to train others, scheduling regular sessions for people to share learnings, and hosting biweekly documentation review sessions are practices that have proven successful in building communities of practitioners (see sidebar “A sample of new generative AI skills needed”).

It’s important to bear in mind that successful gen AI skills are about more than coding proficiency. Our experience in developing our own gen AI platform, Lilli , showed us that the best gen AI technical talent has design skills to uncover where to focus solutions, contextual understanding to ensure the most relevant and high-quality answers are generated, collaboration skills to work well with knowledge experts (to test and validate answers and develop an appropriate curation approach), strong forensic skills to figure out causes of breakdowns (is the issue the data, the interpretation of the user’s intent, the quality of metadata on embeddings, or something else?), and anticipation skills to conceive of and plan for possible outcomes and to put the right kind of tracking into their code. A pure coder who doesn’t intrinsically have these skills may not be as useful a team member.

While current upskilling is largely based on a “learn on the job” approach, we see a rapid market emerging for people who have learned these skills over the past year. That skill growth is moving quickly. GitHub reported that developers were working on gen AI projects “in big numbers,” and that 65,000 public gen AI projects were created on its platform in 2023—a jump of almost 250 percent over the previous year. If your company is just starting its gen AI journey, you could consider hiring two or three senior engineers who have built a gen AI shaper product for their companies. This could greatly accelerate your efforts.

Form a centralized team to establish standards that enable responsible scaling

To ensure that all parts of the business can scale gen AI capabilities, centralizing competencies is a natural first move. The critical focus for this central team will be to develop and put in place protocols and standards to support scale, ensuring that teams can access models while also minimizing risk and containing costs. The team’s work could include, for example, procuring models and prescribing ways to access them, developing standards for data readiness, setting up approved prompt libraries, and allocating resources.

While developing Lilli, our team had its mind on scale when it created an open plug-in architecture and setting standards for how APIs should function and be built.  They developed standardized tooling and infrastructure where teams could securely experiment and access a GPT LLM , a gateway with preapproved APIs that teams could access, and a self-serve developer portal. Our goal is that this approach, over time, can help shift “Lilli as a product” (that a handful of teams use to build specific solutions) to “Lilli as a platform” (that teams across the enterprise can access to build other products).

For teams developing gen AI solutions, squad composition will be similar to AI teams but with data engineers and data scientists with gen AI experience and more contributors from risk management, compliance, and legal functions. The general idea of staffing squads with resources that are federated from the different expertise areas will not change, but the skill composition of a gen-AI-intensive squad will.

Set up the technology architecture to scale

Building a gen AI model is often relatively straightforward, but making it fully operational at scale is a different matter entirely. We’ve seen engineers build a basic chatbot in a week, but releasing a stable, accurate, and compliant version that scales can take four months. That’s why, our experience shows, the actual model costs may be less than 10 to 15 percent of the total costs of the solution.

Building for scale doesn’t mean building a new technology architecture. But it does mean focusing on a few core decisions that simplify and speed up processes without breaking the bank. Three such decisions stand out:

  • Focus on reusing your technology. Reusing code can increase the development speed of gen AI use cases by 30 to 50 percent. One good approach is simply creating a source for approved tools, code, and components. A financial-services company, for example, created a library of production-grade tools, which had been approved by both the security and legal teams, and made them available in a library for teams to use. More important is taking the time to identify and build those capabilities that are common across the most priority use cases. The same financial-services company, for example, identified three components that could be reused for more than 100 identified use cases. By building those first, they were able to generate a significant portion of the code base for all the identified use cases—essentially giving every application a big head start.
  • Focus the architecture on enabling efficient connections between gen AI models and internal systems. For gen AI models to work effectively in the shaper archetype, they need access to a business’s data and applications. Advances in integration and orchestration frameworks have significantly reduced the effort required to make those connections. But laying out what those integrations are and how to enable them is critical to ensure these models work efficiently and to avoid the complexity that creates technical debt  (the “tax” a company pays in terms of time and resources needed to redress existing technology issues). Chief information officers and chief technology officers can define reference architectures and integration standards for their organizations. Key elements should include a model hub, which contains trained and approved models that can be provisioned on demand; standard APIs that act as bridges connecting gen AI models to applications or data; and context management and caching, which speed up processing by providing models with relevant information from enterprise data sources.
  • Build up your testing and quality assurance capabilities. Our own experience building Lilli taught us to prioritize testing over development. Our team invested in not only developing testing protocols for each stage of development but also aligning the entire team so that, for example, it was clear who specifically needed to sign off on each stage of the process. This slowed down initial development but sped up the overall delivery pace and quality by cutting back on errors and the time needed to fix mistakes.

Ensure data quality and focus on unstructured data to fuel your models

The ability of a business to generate and scale value from gen AI models will depend on how well it takes advantage of its own data. As with technology, targeted upgrades to existing data architecture  are needed to maximize the future strategic benefits of gen AI:

  • Be targeted in ramping up your data quality and data augmentation efforts. While data quality has always been an important issue, the scale and scope of data that gen AI models can use—especially unstructured data—has made this issue much more consequential. For this reason, it’s critical to get the data foundations right, from clarifying decision rights to defining clear data processes to establishing taxonomies so models can access the data they need. The companies that do this well tie their data quality and augmentation efforts to the specific AI/gen AI application and use case—you don’t need this data foundation to extend to every corner of the enterprise. This could mean, for example, developing a new data repository for all equipment specifications and reported issues to better support maintenance copilot applications.
  • Understand what value is locked into your unstructured data. Most organizations have traditionally focused their data efforts on structured data (values that can be organized in tables, such as prices and features). But the real value from LLMs comes from their ability to work with unstructured data (for example, PowerPoint slides, videos, and text). Companies can map out which unstructured data sources are most valuable and establish metadata tagging standards so models can process the data and teams can find what they need (tagging is particularly important to help companies remove data from models as well, if necessary). Be creative in thinking about data opportunities. Some companies, for example, are interviewing senior employees as they retire and feeding that captured institutional knowledge into an LLM to help improve their copilot performance.
  • Optimize to lower costs at scale. There is often as much as a tenfold difference between what companies pay for data and what they could be paying if they optimized their data infrastructure and underlying costs. This issue often stems from companies scaling their proofs of concept without optimizing their data approach. Two costs generally stand out. One is storage costs arising from companies uploading terabytes of data into the cloud and wanting that data available 24/7. In practice, companies rarely need more than 10 percent of their data to have that level of availability, and accessing the rest over a 24- or 48-hour period is a much cheaper option. The other costs relate to computation with models that require on-call access to thousands of processors to run. This is especially the case when companies are building their own models (the maker archetype) but also when they are using pretrained models and running them with their own data and use cases (the shaper archetype). Companies could take a close look at how they can optimize computation costs on cloud platforms—for instance, putting some models in a queue to run when processors aren’t being used (such as when Americans go to bed and consumption of computing services like Netflix decreases) is a much cheaper option.

Build trust and reusability to drive adoption and scale

Because many people have concerns about gen AI, the bar on explaining how these tools work is much higher than for most solutions. People who use the tools want to know how they work, not just what they do. So it’s important to invest extra time and money to build trust by ensuring model accuracy and making it easy to check answers.

One insurance company, for example, created a gen AI tool to help manage claims. As part of the tool, it listed all the guardrails that had been put in place, and for each answer provided a link to the sentence or page of the relevant policy documents. The company also used an LLM to generate many variations of the same question to ensure answer consistency. These steps, among others, were critical to helping end users build trust in the tool.

Part of the training for maintenance teams using a gen AI tool should be to help them understand the limitations of models and how best to get the right answers. That includes teaching workers strategies to get to the best answer as fast as possible by starting with broad questions then narrowing them down. This provides the model with more context, and it also helps remove any bias of the people who might think they know the answer already. Having model interfaces that look and feel the same as existing tools also helps users feel less pressured to learn something new each time a new application is introduced.

Getting to scale means that businesses will need to stop building one-off solutions that are hard to use for other similar use cases. One global energy and materials company, for example, has established ease of reuse as a key requirement for all gen AI models, and has found in early iterations that 50 to 60 percent of its components can be reused. This means setting standards for developing gen AI assets (for example, prompts and context) that can be easily reused for other cases.

While many of the risk issues relating to gen AI are evolutions of discussions that were already brewing—for instance, data privacy, security, bias risk, job displacement, and intellectual property protection—gen AI has greatly expanded that risk landscape. Just 21 percent of companies reporting AI adoption say they have established policies governing employees’ use of gen AI technologies.

Similarly, a set of tests for AI/gen AI solutions should be established to demonstrate that data privacy, debiasing, and intellectual property protection are respected. Some organizations, in fact, are proposing to release models accompanied with documentation that details their performance characteristics. Documenting your decisions and rationales can be particularly helpful in conversations with regulators.

In some ways, this article is premature—so much is changing that we’ll likely have a profoundly different understanding of gen AI and its capabilities in a year’s time. But the core truths of finding value and driving change will still apply. How well companies have learned those lessons may largely determine how successful they’ll be in capturing that value.

Eric Lamarre

The authors wish to thank Michael Chui, Juan Couto, Ben Ellencweig, Josh Gartner, Bryce Hall, Holger Harreis, Phil Hudelson, Suzana Iacob, Sid Kamath, Neerav Kingsland, Kitti Lakner, Robert Levin, Matej Macak, Lapo Mori, Alex Peluffo, Aldo Rosales, Erik Roth, Abdul Wahab Shaikh, and Stephen Xu for their contributions to this article.

This article was edited by Barr Seitz, an editorial director in the New York office.

Explore a career with us

Related articles.

Light dots and lines evolve into a pattern of a human face and continue to stream off the the side in a moving grid pattern.

The economic potential of generative AI: The next productivity frontier

A yellow wire shaped into a butterfly

Rewired to outcompete

A digital construction of a human face consisting of blocks

Meet Lilli, our generative AI tool that’s a researcher, a time saver, and an inspiration

We've detected unusual activity from your computer network

To continue, please click the box below to let us know you're not a robot.

Why did this happen?

Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. For more information you can review our Terms of Service and Cookie Policy .

For inquiries related to this message please contact our support team and provide the reference ID below.

Biden signs executive order on advancing study of women’s health while chiding GOP ideas he calls ‘backward’

President Joe Biden speaks during a St. Patrick's Day reception in the East Room of the White...

WASHINGTON (AP) — President Joe Biden signed an executive order Monday aimed at advancing the study of women’s health by strengthening data collection and providing better funding opportunities for biomedical research while chiding Republicans for having “no clue about the power of women” but saying they’re “about to find out” come November’s election.

Women’s health has long been underfunded and understudied. It wasn’t until the 1990s that the federal government mandated women be included in federally funded medical research; for most of medical history, though, scientific study was based almost entirely on men.

“We still know too little about how to effectively prevent, diagnose and treat a wide array of health conditions in women,” said Dr. Carolyn Mazure, the head of the White House initiative on women’s health.

Today, research often fails to properly track differences between women and men, and does not represent women equally particularly for illnesses more common to them — which Biden suggested his order would help change.

“To state the obvious, women are half the population and underrepresented across the board. But not in my administration,” the president said, drawing raucous applause at a White House reception marking Women’s History Month.

Biden said he’s long been a believer in the “power of research” to help save lives and get high-quality health care to the people who need it. But the executive order also checks off a political box during an election year when women will be crucial to his reelection efforts. First lady Jill Biden is leading both the effort to organize and mobilize female voters and the  White House Initiative on Women’s Health Research .

The announcement comes as the ripple effects spread from the Supreme Court’s decision that overturned federal abortion rights, touching on medical issues for women who never intended to end their pregnancies. In  Alabama, for example, the future of IVF was thrown into question statewide  after a judge’s ruling.

In his comments at the reception, Biden didn’t mention by name former President Donald Trump, who is now running to reclaim the White House. Instead, he referred to “my predecessor” who had been “bragging about overturning” the Roe v. Wade decision that had guaranteed the constitutional right to abortion.

The president suggested that would hurt Trump and the GOP during this fall’s election, saying, “You can’t lead America with old ideas and take us backward.”

Further leaning into politics, Biden said his administration has “turned around the economy because we focused on women,” noting that female unemployment had fallen and the number of women-owned small businesses had increased.

He said his administration has ensured that “women can access jobs in sectors where they’ve been historically underrepresented” and said he’d told leaders from some of the nation’s top labor unions that he wants to see more women and minorities in their ranks.

Women were a critical part of the coalition that elected Biden in 2020, giving him 55% of their vote, according to AP VoteCast. Black women and suburban women were pillars of Biden’s coalition while Trump had a modest advantage among white women and a much wider share of white women without college degrees, according to the AP survey of more than 110,000 voters in that year’s election.

Vice President Kamala Harris, women’s health advocate Maria Shriver and the first lady also addressed the reception.

“Finally women will get the health care we deserve,” Jill Biden said, saying the order signed Monday was “without precedent.”

Harris drew strong applause for noting that she “stood before you as the first woman vice president of the United States” and talked about  visiting an abortion clinic  in Minnesota last week.

“There are those who are intent on dragging us backward,” the vice president said of Republican states that have limited access to abortion.

“We all face a question: What kind of country do we want to live in?” Harris said. “A country of liberty, freedom and rule of law? Or a country of disorder, fear and hate?”

Shriver joked that this is probably the first time a president has signed an executive order that mentions menopause and said the action could only be taken “by a president who respects women.”

The National Institutes of Health is also launching a new effort around menopause and the treatment of menopausal symptoms that will identify research gaps and work to close them, said White House adviser Jennifer Klein. NIH funds a huge amount of biomedical research, imperative for the understanding of how medications affect the human body and for deciding eventually how to dose medicine.

Some conditions have different symptoms for women and men, such as heart disease. Others are more common in women, like Alzheimer’s disease, and some are unique to women — such as endometriosis, uterine cancers and fibroids found in the uterus. It’s all ripe for study, Mazure said.

And uneven research can have profound effects; a 2020 study by researchers at the University of Chicago and University of California, Berkeley, found that women were being overmedicated and suffering side effects from common medications because most of the dosage trials were done only on men.

The first lady announced $100 million  in funding last month for women’s health.

Associated Press writer Gary Fields contributed to this report.

Copyright 2024 The Associated Press. All rights reserved.

The Berkeley County Sheriff’s Office said nine people are facing charges in connection to...

Nine charged in drug bust at Goose Creek motel

The shooting happened around 9 p.m. near the Ginger Lane and Smoketree Court, police...

Wife accused of fatally shooting husband in N. Charleston, no charges filed

Several sheriff’s office vehicles are located near the intersection of Archdale Boulevard and...

Deputies respond to incident in Dorchester County neighborhood

Multiple Lowcountry customers say they became victims of a photographer who robbed them of...

'Forever lost': Customers accuse photographer of failure to follow through

Charleston Police are investigating a crash that left one person dead over the weekend after...

Charleston Police recover body of missing person, vehicle from pond

Latest news.

FILE - Senate Majority Leader Chuck Schumer, D-N.Y., walks outside the chamber as he tries to...

Biden and congressional leaders announce a deal on government funding as a partial shutdown looms

Several customers across the United States said they could not access their accounts by using...

USAA customers experience disruptions in online, app services

Teddy was diagnosed with intrauterine growth restriction, where the baby does not grow as...

Premature ‘miracle baby’ getting ready to head home after weighing just 14 ounces at birth

The state Insurance Reserve Fund has paid a Myrtle Beach woman $500,000 dollars for injuries...

Woman receives $500K payout from complications during treatment at MUSC

Ryan Nicholas Langdon was arrested on charges connected to the sexual exploitation of a minor,...

Goose Creek man previously convicted of child sex crimes arrested on new charges

Create an account

Create a free IEA account to download our reports or subcribe to a paid service.

Key findings

  • Understanding methane emissions
  • What did COP28 mean for methane?
  • Methane emissions in a 1.5 °C pathway
  • Tracking pledges, targets and action
  • Progress on data and lingering uncertainties

Cite report

IEA (2024), Global Methane Tracker 2024 , IEA, Paris https://www.iea.org/reports/global-methane-tracker-2024, Licence: CC BY 4.0

Share this report

  • Share on Twitter Twitter
  • Share on Facebook Facebook
  • Share on LinkedIn LinkedIn
  • Share on Email Email
  • Share on Print Print

Methane emissions from the energy sector remained near a record high in 2023

We estimate that the production and use of fossil fuels resulted in close to 120 million tonnes (Mt) of methane emissions in 2023, while a further 10 Mt came from bioenergy – largely stemming from the traditional use of biomass. Emissions have remained around this level since 2019, when they reached a record high. Since fossil fuel supply has continued to expand since then, this indicates that the average methane intensity of production globally has declined marginally during this period.

The latest IEA Global Methane Tracker is based on the most recently available data on methane emissions from the energy sector and incorporates new scientific studies, measurement campaigns, and information collected from satellites.

Analysis of this data reveals both signs of progress and some worrying trends. On one hand, more governments and fossil fuel companies have committed to take action on methane. Global efforts to report emissions estimates consistently and transparently are strengthening, and studies suggest emissions are falling in some regions. However, overall emissions remain far too high to meet the world’s climate goals. Large methane emissions events detected by satellites also rose by more than 50% in 2023 compared with 2022, with more than 5 Mt of methane emissions detected from major fossil fuel leaks around the world – including a major well blowout in Kazakhstan that went on for more than 200 days. 

Methane emissions from energy, 2000-2023

Close to 70% of methane emissions from fossil fuels come from the top 10 emitting countries.

Of the nearly 120 Mt of emissions we estimate were tied to fossil fuels in 2023, around 80 Mt came from countries that are among the top 10 emitters of methane globally. The United States is the largest emitter of methane from oil and gas operations, closely followed by the Russian Federation (hereafter “Russia”). The People’s Republic of China (hereafter “China”) is by far the highest emitter in the coal sector. The amount of methane lost in fossil fuel operations globally in 2023 was 170 billion cubic metres, more than Qatar’s natural gas production.

The methane emissions intensity of oil and gas production varies widely. The best-performing countries score more than 100 times better than the worst. Norway and the Netherlands have the lowest emissions intensities. Countries in the Middle East, such as Saudi Arabia and the United Arab Emirates, also have relatively low emissions intensities. Turkmenistan and Venezuela have the highest. High emissions intensities are not inevitable; they can be addressed cost-effectively through a combination of high operational standards, policy action and technology deployment. On all these fronts, best practices are well established.

Methane emissions from oil and gas production and methane intensity for selected producers, 2023

Cutting methane emissions from fossil fuels by 75% by 2030 is vital to limit warming to 1.5 °c.

The energy sector accounts for more than one third of total methane emissions attributable to human activity, and cutting emissions from fossil fuel operations has the most potential for major reductions in the near term. We estimate that around 80 Mt of annual methane emissions from fossil fuels can be avoided through the deployment of known and existing technologies, often at low – or even negative – cost.

In our Net Zero Emissions by 2050 (NZE) Scenario – which sees the global energy sector achieving net zero emissions by mid-century, limiting the temperature rise to 1.5 °C – methane emissions from fossil fuel operations fall by around 75% by 2030. By that year, all fossil fuel producers have an emissions intensity similar to the world’s best operators today. Targeted measures to reduce methane emissions are necessary even as fossil fuel use begins to decline; cutting fossil fuel demand alone is not enough to achieve the deep and sustained reductions needed.

Methane abatement potential to 2030

Main sources of methane emissions, full implementation of cop28 and other pledges would cut fossil fuel methane emissions by 50%.

The COP28 climate summit in Dubai produced a host of new pledges to accelerate action on methane. Importantly, the outcome of the first Global Stocktake called for countries to substantially reduce methane emissions by 2030. Additionally, more than 50 oil and gas companies launched the Oil and Gas Decarbonization Charter (OGDC) to speed up emissions reductions within the industry, new countries joined the Global Methane Pledge, and new finance was mobilised to support the reduction of methane and greenhouse gases (GHGs) other than carbon dioxide (CO 2 ).

Substantial new policies and regulations on methane were also established or announced in 2023, including by the United States , Canada , and the European Union and China published an action plan dedicated to methane emission control. A series of supportive initiatives have been launched to accompany these efforts, such as the Methane Alert and Response System and the Oil and Gas Climate Initiative’s Satellite Monitoring Campaign .

Taken together, we estimate that if all methane policies and pledges made by countries and companies to date are implemented and achieved in full and on time, methane emissions from fossil fuels would decline by around 50% by 2030. However, in most cases, these pledges are not yet backed up by detailed plans, policies and regulations. The detailed methane policies and regulations that currently exist would cut emissions from fossil fuel operations by around 20% from 2023 levels by 2030. The upcoming round of updated Nationally Determined Contributions (NDCs) under the Paris Agreement, which will see countries set climate goals through 2035, presents a major opportunity for governments to set bolder targets on energy-related methane and lay out plans to achieve them.

Reductions in methane emissions from fossil fuel operations from existing policies and pledges, 2020-2030

Around 40% of today’s methane emissions from fossil fuels could be avoided at no net cost.

Methane abatement in the fossil fuel industry is one of the most pragmatic and lowest cost options to reduce greenhouse gas emissions. The technologies and measures to prevent emissions are well known and have already been deployed successfully around the world. Around 40% of the 120 Mt of methane emissions from fossil fuels could be avoided at no net cost, based on average energy prices in 2023. This is because the required outlays for abatement measures are less than the market value of the additional methane gas captured and sold or used. The share is higher for oil and natural gas (50%) than for coal (15%).

There are many possible reasons why companies are not deploying these measures even though they pay for themselves. For example, the return on investment for methane abatement projects may be longer than for other investment opportunities. There may also be a lack of awareness regarding the scale of methane emissions and the cost-effectiveness of abatement. Sometimes infrastructure or institutional arrangements are inadequate, making it difficult for companies to receive the income from avoided emissions.

Regardless of the value of captured gas, we estimate that it would be cost-effective to deploy nearly all fossil fuel methane abatement measures if emissions are priced at about USD 20/tonne CO 2 ‑equivalent. Tapping into this potential will require new regulatory frameworks, financing mechanisms and improved emissions tracking.

Marginal abatement cost curve for methane from coal, 2023

Marginal abatement cost curve for methane from oil and natural gas operations, 2023, delivering the 75% cut in methane emissions requires usd 170 billion in spending to 2030.

We estimate that around USD 170 billion in spending is needed to deliver the methane abatement measures deployed by the fossil fuel industry in the NZE Scenario. This includes around USD 100 billion of spending in the oil and gas sector and USD 70 billion in the coal industry. Through 2030, roughly USD 135 billion goes towards capital expenditures, while USD 35 billion is for operational expenditures.

Fossil fuel companies should carry the primary responsibility for financing these abatement measures, given that the amount of spending needed represents less than 5% of the income the industry generated in 2023. Nonetheless, we estimate that about USD 45 billion of spending in low- and middle-income countries requires particular attention, as sources of finance are likely to be more limited. To date, we estimate that external sources of finance targeted at reducing methane in the fossil fuel industry total less than USD 1 billion, although this should catalyse a far greater level of spending.

Spending for methane abatement in coal operations in the Net Zero Scenario, 2024-2030

Spending for methane abatement in oil and gas operations in the net zero scenario, 2024-2030, new tools to track emissions will bring a step change in transparency.

Better and more transparent data based on measurements of methane emissions is becoming increasingly accessible and will support more effective mitigation. In 2023, Kayrros , an analytics firm, released a tool based on satellite imagery that quantifies large methane emissions on a daily basis and provides country-level oil and gas methane intensities. GHGSat , another technology company, increased its constellation of satellites in orbit to 12 and started to offer targeted monitoring of offshore methane emissions, while the United Nations Environment Programme (UNEP) Methane Alert and Response System (MARS) ramped up usage of satellites to detect major methane emission events and alert government authorities and involved operators.

Despite this progress, little or no measurement-based data is used to report emissions in most parts of the world – which is an issue since measured emissions tend to be higher than reported emissions. For example, if companies that report emissions to UNEP’s Oil & Gas Methane Partnership 2.0 were to be fully representative of the industry globally, this would imply that global oil and gas methane emissions in 2023 were around 5 Mt, 95% lower than our estimate. Total oil and gas emissions levels reported by countries to the UN Framework Convention on Climate Change are close to 40 Mt, about 50% lower than our 2023 estimate. There are many possible reasons for these major discrepancies, but they will only be resolved through more systematic and transparent use of measured data.

Regardless, all assessments make clear that methane emissions from fossil fuels operations are a major issue and that renewed action – by governments, companies, and financial actors – is essential.

Methane emissions from global oil and gas supply

Subscription successful.

Thank you for subscribing. You can unsubscribe at any time by clicking the link at the bottom of any IEA newsletter.

Why Honigman LLP Chose BigHand Resource Management

Abby Stover, Chief Talent Officer at Honigman LLP, discusses why the firm chose BigHand Resource Management.

Abby: Not only were lawyers unable to walk down the hall and find staffing, they also were busier than they had been in recent memory. So that was one real driver - staffing matters, getting folks to work on them understanding what they were working on, and why they may not be available. Another driver is the clients themselves. As I mentioned, we do love to hear from our clients, we involve them in our Innovation Day festivities.

The clients have really grown very sophisticated, as everyone knows, and they're looking for appropriate staffing, staffing that provides the value that they that they want. It's not always it's not necessarily a cost issue. But they want the work done by the best and most efficient biller possible. So that was another issue.

As I mentioned, we have an entire client service team that provides these alternative models. We have a full legal project management cohort that works with our clients to staff. And this is another piece that was plugging in the data piece of this who's busy, who's not busy, who can take on more work with who is the most efficient and cost effective, and experienced person to put on the file annually, as you touched on in the introduction, diversity, equity and inclusion metrics, which this tool provides. Many of our clients are large multinational corporations that have their own goals and who look for that data when they are engaging counsel.

How did Honigman LLP approach the change management process?

Abby:  The pilot process was an important component of that to build some buzz to get some folks fluent in it. And then to have the leaders in our system, the generally the practice group leaders are the resource managers. So having the resource manager from the private equity group and the resource manager from our professional attorney practices, get bought into it and be able to start talking about it before it even rolled out. So that was one piece of the process.

We also put together a drip campaign where we can we're still in that actually, where we roll out an initial training, explain the features of the tool and why they're important. What's really key was the "Why does this help you? How does this help you?"

In Q1, we're rolling out the skills portion. I was updating all the skills, which in our system are relatively old, and the BigHand tool has a much nicer interface rather than a yes/no checklist, which we've seen in other products, and has a spectrum of experience makes it easier, also makes it searchable, which is great. So we'll be then going back out to our practice group leaders and attorneys hitting them again, with Oh, and now we have this new feature. Let's check it out together again.

BigHand Resource Management surfaces actionable data to allocate work more efficiently.

Abby: The dashboard that is available to both the resource managers and the "resources" is super cool and provides a lot of data that is an even if you didn't use forecasting at all, provide some really cool data. So I like to show attorneys who are the resources - the associates, the staff attorneys - I like to show them the KPI card that shows their progress toward meeting their hours goal. They find that really helpful. Their six month historic utilization, they find very helpful.

And the thing that always generates the most conversation is the comparison KPI that allows attorneys to see how they are performing in terms of productivity, versus other attorneys in their department with the same title for other attorneys firm wide with the same title. And it creates a little bit of healthy competition. And they can see where they where they fit. So those things once I remind them that those are there, they're like, "Yeah, I'd much rather look at that than just the time entry system and try to figure out how am I progress toward my goal is from there,"

Then the resource managers began to use and it's just the main dashboard where they can see all the folks in their practice group. How what they're one week or four week average is, it's it just takes the data is available in other ways. And we always found it in other ways, but they were very clunky. They didn't display - you'd have to put together different pieces using different reports. And this is just, it's sleek, it streamlines and it gives the information immediately. And they really just show them that and they say if you do nothing else besides look at these things, this will be useful to you.

About BigHand Resource Management

BigHand Resource Management is a legal work allocation tool that allows law firms to identify resources, forecast utilization, manage workloads and add structure to career development for lawyers. The solution delivers real-time visibility of team availability, improved profitability on matters and supports DEI goals and equitable allocation of work.

BigHand Resource Management

Like what you see try our related resources.

linkedin big data case study

Webinar: Find out how BigHand and Digitory Legal fit together

linkedin big data case study

Law Firm Leaders Discuss the Benefits of BigHand Resource Management

linkedin big data case study

Webinar: Getting More from Less - Strategic Resource Management for your Legal Teams

linkedin big data case study

Client Testimonial: BCLP and BigHand Resource Management

  • English (UK)
  • English (US)
  • English (APAC)

bsi ISO/IEC 27001

© 2024 BigHand. All Rights Reserved. Various trademarks held by their respective owners.

Request a demo

IMAGES

  1. Top 10 Big Data Case Studies that You Should Know

    linkedin big data case study

  2. How LinkedIn uses Hadoop to leverage Big Data Analytics?

    linkedin big data case study

  3. Big Data Introduction 10 Case Study LinkedIn

    linkedin big data case study

  4. How to Customize a Case Study Infographic With Animated Data

    linkedin big data case study

  5. How LinkedIn uses Hadoop to leverage Big Data Analytics?

    linkedin big data case study

  6. Evolving LinkedIn’s analytics tech stack

    linkedin big data case study

VIDEO

  1. Top 25 Data Engineers Linkedin #shorts

  2. Tips to prepare for Big Data Interviews

  3. Build a Powerful LinkedIn Profile

  4. Volkswagen Group: Driving Big Business With Big Data Case Solution & Analysis- TheCaseSolutions.com

  5. Lecture3

  6. How to learn Big Data in less than 6 months

COMMENTS

  1. A Review and Case studies at BigData

    3. LinkedIn. LinkedIn tracks every move users make on the site, and the company analyses this mountain of data in order to make better decisions and design data-powered features. Clearly, LinkedIn ...

  2. 15 important case studies on Big Data

    UPS United Parcel Service of North America, Inc., referred as UPS , is one of the largest shipment and logistics companies in the world. The company tracks data on 16.3 million packages per day ...

  3. How LinkedIn uses Hadoop to leverage Big Data Analytics?

    At LinkedIn, big data is more about business than data. Here's a case study exploring how LinkedIn uses its data goldmine to be a game changer in the professional network space. "Our ultimate dream is to develop the world's first economic graph", a sort of digital map of skills, workers and jobs across the global economy.

  4. 2024 LinkedIn Guide for Big Data Analysts

    A robust skills section is crucial for Big Data Analysts. List technical skills like Hadoop, SQL, Python, R, machine learning, data mining, and statistical analysis, as well as soft skills such as critical thinking, problem-solving, and effective communication. Endorsements from colleagues, supervisors, or clients serve as social proof of your ...

  5. 5 Minute 'Big Data' Case Study: DoorDash

    DoorDash delivers food on-demand and on time to consumers and B2B. Their secret sauce is optimizing the logistical 'last mile' through machine learning and big data analytics. "We've broken ...

  6. Inside LinkedIn's Big Data Pipelines

    The system manages data processing and storage for big data applications by providing high throughput access to application data. LinkedIn's records are aggregated across more than 50 offline data flows, making its huge dataset applicable for Hadoop. To ensure business continuity, LinkedIn picked Teradata to meet the growing demands in batch ...

  7. PDF The "Big Data" Ecosystem at LinkedIn

    7.3 OLAP. The structured nature of LinkedIn profiles provides various mem-ber facets and dimensions: company, school, group, geography, etc. Combined with activity data, this can provide valuable insights if one can slice and dice along various dimensions. OLAP accounts for approximately 10% of egress usage.

  8. 17 important case studies on Big Data

    Check out these 17 important case studies on Big Data. 23andMe. 23andMe is a privately held personal genomics and biotechnology company. The company has developed its whole model around pulling insights from big data to give customers a 360-degree understanding of their genetic history. CBA. Commonwealth Bank of Australia is using big data to ...

  9. PDF case study collection 7 get big data

    big data - case study collection 1 Big Data is a big thing and this case study collection will give you a good overview of how some companies really leverage big data to drive business performance. They range from industry giants like Google, Amazon, Facebook, GE, and Microsoft, to smaller businesses which have put big data at the centre of

  10. "Big Data" Ecosystem at LinkedIn. $ What is BigData?

    Clearly, LinkedIn uses Big Data right across the company, but here are just a couple of examples of it in action. ... Case study; Videos were to be around one minute in length (plus or minus), and ...

  11. A Case Study on LinkedIn

    Introduction Big Data refers to processing huge volumes of data that are beyond traditional processing of RDBMS. Data in today's digital and tech world is highly diverse in variety, type, source, velocity, veracity etc. which needs real-time handling, pre-processing and summary analysis for accurate pattern recognition, association, correlation, regression, visualization, fraud detection and ...

  12. Ten big data case studies in a nutshell

    You haven't seen big data in action until you've seen Gartner analyst Doug Laney present 55 examples of big data case studies in 55 minutes. It's kind of like The Complete Works of Shakespeare, Laney joked at Gartner Symposium, though "less entertaining and hopefully more informative."(Well, maybe, for this tech crowd.) The presentation was, without question, a master class on the three Vs ...

  13. The Big Search Case Study

    In just two years, The Big Search has generated more than 3.000 Talent Pool and Company Reports, which have informed the company's hiring strategy. LinkedIn data has helped The Big Search to identify and unlock pockets of relevant talent and LinkedIn data has also enabled The Big Search to prove talent scarcity within certain sectors.

  14. Big Data

    Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data ...

  15. Netflix Recommender System

    The V's of Big Data . Volume: As of May 2019, Netflix has around 13,612 titles (Gaël, 2019). Their US library alone consists of 5087 titles. As of 2016, Netflix has completed its migration to Amazon Web Services. Their data of tens of petabytes of data was moved to AWS (Brodkin et al., 2016).

  16. 8 case studies and real world examples of how Big Data has helped keep

    Here are some case studies that show some ways BI is making a difference for companies around the world: 1) Starbucks: With 90 million transactions a week in 25,000 stores worldwide the coffee giant is in many ways on the cutting edge of using big data and artificial intelligence to help direct marketing, sales and business decisions

  17. Top 10 Big Data Case Studies that You Should Know

    Top 10 Big Data Case Studies. 1. Big data in Netflix. Netflix implements data analytics models to discover customer behavior and buying patterns. Then, using this information it recommends movies and TV shows to their customers. That is, it analyzes the customer's choice and preferences and suggests shows and movies accordingly.

  18. Big data in healthcare

    Big data in healthcare is a rapidly evolving field that offers promising opportunities for improving health outcomes, reducing costs, and enhancing patient satisfaction. However, it also poses significant challenges, such as data quality, privacy, security, and ethical issues. This article provides an overview of the current state and future directions of big data in healthcare, as well as ...

  19. Modern Data Stack Explained

    A Modern Data Stack (MDS) is a collection of tools and technologies used to gather, store, process, and analyze data in a scalable, efficient, and cost-effective way. The term stack in computing means a group of technologies working together to achieve a common goal.

  20. Big Data case study

    Google generally uses Big data from its Web index to initially match the queries with potentially useful results. It uses machine-learning algorithms to assess the reliability of data and then ...

  21. A generative AI reset: Rewiring to turn potential into value in 2024

    QuantumBlack, McKinsey's AI arm, helps companies transform using the power of technology, technical expertise, and industry experts. With thousands of practitioners at QuantumBlack (data engineers, data scientists, product managers, designers, and software engineers) and McKinsey (industry and domain experts), we are working to solve the world's most important AI challenges.

  22. Apple Car's Crash: Design Details, Tim Cook's Indecision, Failed Tesla

    The inside story is a case study in indecision. Tim Cook shut down plans to acquire Tesla before cycling through a junkyard's worth of self-driving designs over the past decade.

  23. Biden to sign executive order aimed at advancing study of women's health

    (AP) - President Joe Biden is expected to sign an executive order Monday aimed at advancing the study of women's health in part by strengthening data collection and providing easier and better funding opportunities for biomedical research. Women make up half the population, but their health is underfunded and understudied.

  24. Case study: Big data & analytics in the financial services ...

    *This is a case study of a recent commissioned report. Please contact me if you have similar needs or would like to discuss research capabilities around retail, ecommerce, big data, financial ...

  25. Key findings

    The latest IEA Global Methane Tracker is based on the most recently available data on methane emissions from the energy sector and incorporates new scientific studies, measurement campaigns, and information collected from satellites. Analysis of this data reveals both signs of progress and some worrying trends.

  26. Sachin Wadawadgi

    AWS is the most used cloud computing platform that is used in the Data Engineering space. Some of the most used big data frameworks of #aws are. 1. EMR (Elastic MapReduce) 2.

  27. Why Honigman LLP Chose BigHand Resource Management

    BigHand Resource Management surfaces actionable data to allocate work more efficiently. Abby: The dashboard that is available to both the resource managers and the "resources" is super cool and provides a lot of data that is an even if you didn't use forecasting at all, provide some really cool data. So I like to show attorneys who are the ...

  28. Case study : Big data for facebook

    The Three V's of Big data. Volume - The amount of data matters. With big data, you'll have to process high volumes of low-density, unstructured data. Velocity - Velocity is the fast rate at ...