Notes of Data Analytics Online Courses 1-3

 Notes of Google Data Analysis Professional Certificate - Module 1

*Understanding the data ecosystem

Data and gut instinct

Data analysts are just like detectives. They both:

  • depend on facts and clues to make decisions.
  • collect and look at the evidence. 
  • talk to people who know part of the story.
  • might even follow some footprints to see where they lead.

Analysts use data-driven decision-making and follow a step-by-step process. You have learned that there are six steps to this process:

  1. Ask questions and define the problem.
  2. Prepare data by collecting and storing the information.
  3. Process data by cleaning and checking the information.
  4. Analyze data to find patterns, relationships, and trends.
  5. Share data with your audience.
  6. Act on the data and use the analysis results.

But there are other factors that influence the decision-making process. Gut instinct is one of them. Gut instinct is an intuitive understanding of something with little or no explanation. This isn't always something conscious; we often pick up on signals without even realizing. You just have a "feeling" it's right.

Why gut instinct can be a problem

At the heart of data-driven decision making is data. Therefore, it's essential that data analysts focus on the data to ensure they make informed decisions. If you ignore data by preferring to make decisions based on your own experience, your decisions may be biased. But even worse, decisions based on gut instinct without any data to back them up can cause mistakes.

The more you understand the data related to a project, the easier it will be to figure out what is required. These efforts will also help you identify errors and gaps in your data so you can communicate your findings more effectively. Sometimes past experience helps you make a connection that no one else would notice. For example, a detective might be able to crack open a case because they remember an old case just like the one they're solving today. It's not just gut instinct.

Data + business knowledge = mystery solved

Blending data with business knowledge, plus maybe a touch of gut instinct, will be a common part of your process as a junior data analyst. The key to figure out the exact mix for each particular project. A lot of times, it will depend on the goals of your analysis. That is why analysts often ask, "How do I define success for this project?"

Data Analysis Life Cycle (by Google)

  1. Ask: Business challenge / objection / question
  2. Prepare: Data generation, collection, storage, and data management
  3. Process: Data cleaning/ data integrity
  4. Analyze: Data exploration, visualization, and analysis
  5. Share: Communicating and interpreting results
  6. Act: Putting your insights to work to solve the problem

EMC's data analysis life cycle

  1. Discovery
  2. Pre-processing data
  3. Model planning
  4. Model building
  5. Communicate results
  6. Operationalize
EMC Corporation is now Dell EMC. Each step connects and leads to the next, and eventually repeats. It is a little different from the data analysis life cycle this program is based on, but it has some core ideas in common: the first phase is interested in discovering and asking questions; data has to be prepared before it can be analyzed and used; and then findings should be shared and acted in.

SAS's iterative life cycle

  1. Ask
  2. Prepare
  3. Explore
  4. Model
  5. Implement
  6. Act
  7. Evaluate

The SAS model emphasizes the cyclical nature of their model by visualizing it as an infinity symbol. This life cycle is also a little different; it includes a step after the act phase designed to help analysts evaluate their solutions and potentially return to the ask phase again.

Project-based data analytics life cycle

  1. Identifying the problem
  2. Designing data requirements
  3. Pre-processing data
  4. Performing data analysis
  5. Visualizing data
It doesn't include the sixth phase, or what we have been referring to as the Act phase. However, it still covers a lot of the same steps as the life cycles we have already described. It begins with identifying the problem, preparing and processing data before analysis, and ends with data visualization.

Big data analytics life cycle

Authors Thomas Erl, Wajid Khattak, and Paul Buhler proposed a big data analytics life cycle in their book, Big Data Fundamentals: Concepts, Drivers & Techniques. Their life cycle suggests phases divided into nine steps:

  1. Business case evaluation
  2. Data identification
  3. Data acquisition and filtering
  4. Data extraction
  5. Data validation and cleaning 
  6. Data aggregation and representation
  7. Data analysis
  8. Data visualization
  9. Utilization of analysis results

This life cycle appears to have three or four more steps than the previous life cycle models. But in reality, they have just broken down what we have been referring to as Prepare and Process into smaller steps. It emphasizes the individual tasks required for gathering, preparing, and cleaning data before the analysis phase.

Key takeaway

The data analysis process is like real life architecture, there are different way to do things but the same core ideas still appear in each model of the process.

Comments

Popular posts from this blog

【新聞挖掘工坊:第 2 篇】Google News RSS 祕密通道:怎麼抓新聞連結?

【統計抽樣 × NLP 節能分析:第 3 篇】階層、系統、叢集:三大抽樣法一次搞懂

區域網路扁平架構與 Zero Trust 缺口:從 Streamlit 測試到 IoT 隔離的安全評估