Before an analyst begins collecting data, they must answer three questions first:
• What’s the goal or purpose of this research/Project?
• What kinds of data are they planning on gathering?
• What methods and procedures will be used to collect, store, and process the information?

Additionally, we can break up data into qualitative and quantitative types.
• Qualitative data covers descriptions such as color, size, quality, and appearance.

• Quantitative data, unsurprisingly, deals with numbers, such as statistics, poll numbers, percentages, etc.

Data Collection Methods

The two methods are:
• Primary
• As the name implies, this is original, first-hand data collected by the data researchers. This process is the initial information gathering step, performed before anyone carries out any further or related research.
• Primary data results are highly accurate provided the researcher collects the information. However, there’s a downside, as first-hand research is potentially time-consuming and expensive.

 There are different methods to collect primary data
  • Interviews.
  • Projective Technique.
  • Observation
  • Focus Groups.
  • Questionnaires
  • Delphi Technique.

• Secondary
• it’s second-hand information.
• This data is either information that the researcher has tasked other people to collect or information the researcher has looked up
• it’s easier and cheaper to obtain than primary information, secondary information raises concerns regarding accuracy and authenticity.
Quantitative data makes up a majority of secondary data.

since the information has already been collected, the researcher consults various data sources, such as:
  • Financial Statements
  • Sales Reports
  • Retailer/Distributor/Deal Feedback
  • Customer Personal Information (e.g., name, address, age, contact info)
  • Business Journals
  • Government Records (e.g., census, tax records, Social Security info)
  • Trade/Business Magazines
  • The internet

Data Collection and Data Pre-Processing

What is Data Pre-processing? 
• Data Pre-processing is that step in which the data gets transformed, or Encoded, to bring it to such a state that now the machine can easily parse it. In other words, the features of the data can now be easily interpreted by the algorithm.
• Data pre-processing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviours or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues.

In the real world data are generally incomplete: lacking attribute values, lacking
certain attributes of interest, or containing only aggregate data.
Noisy: containing errors or outliers.
Inconsistent: containing discrepancies in codes or names.

• When we talk about data, we usually think of some large datasets with a huge number of rows and columns. While that is a likely scenario, it is not always the case.

data could be in so many different forms: Structured Tables, Images, Audio files, Videos, etc..
• Machines don’t understand free text, image, or video data as it is, they understand 1s and 0s.
• So it probably won’t be good enough if we put on a slideshow of all our images and expect our machine learning model to get trained just by that!

Download link for more Detail:  Data Collection and Data Pre-Processing


Also, Join my Telegram channel with the below link

Also, join my Whatsapp group with the below link