Data Science: Data Collection and Data Pre-Processing

Before an analyst begins collecting data, they must answer three questions first:

• What’s the goal or purpose of this research/Project?

• What kinds of data are they planning on gathering?

• What methods and procedures will be used to collect, store, and process the information?

Additionally, we can break up data into qualitative and quantitative types.

• Qualitative data covers descriptions such as color, size, quality, and appearance.

• Quantitative data, unsurprisingly, deals with numbers, such as statistics, poll numbers, percentages, etc.

Data Collection Methods

• The two methods are:

• Primary

• As the name implies, this is original, first-hand data collected by the data researchers. This process is the initial information gathering step, performed before anyone carries out any further or related research.

• Primary data results are highly accurate provided the researcher collects the information. However, there’s a downside, as first-hand research is potentially time-consuming and expensive.

There are different methods to collect primary data

Interviews.
Projective Technique.
Observation
Focus Groups.
Questionnaires
Delphi Technique.

• Secondary

• it’s second-hand information.

• This data is either information that the researcher has tasked other people to collect or information the researcher has looked up

• it’s easier and cheaper to obtain than primary information, secondary information raises concerns regarding accuracy and authenticity.

Quantitative data makes up a majority of secondary data.

since the information has already been collected, the researcher consults various data sources, such as:

Financial Statements
Sales Reports
Retailer/Distributor/Deal Feedback
Customer Personal Information (e.g., name, address, age, contact info)
Business Journals
Government Records (e.g., census, tax records, Social Security info)
Trade/Business Magazines
The internet

What is Data Pre-processing?

• Data Pre-processing is that step in which the data gets transformed, or Encoded, to bring it to such a state that now the machine can easily parse it. In other words, the features of the data can now be easily interpreted by the algorithm.

• Data pre-processing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviours or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues.

• In the real world data are generally incomplete: lacking attribute values, lacking

certain attributes of interest, or containing only aggregate data.

• Noisy: containing errors or outliers.

• Inconsistent: containing discrepancies in codes or names.

• When we talk about data, we usually think of some large datasets with a huge number of rows and columns. While that is a likely scenario, it is not always the case.

data could be in so many different forms: Structured Tables, Images, Audio files, Videos, etc..

• Machines don’t understand free text, image, or video data as it is, they understand 1s and 0s.

• So it probably won’t be good enough if we put on a slideshow of all our images and expect our machine learning model to get trained just by that!

Download link for more Detail: Data Collection and Data Pre-Processing

Or follow my blog from the below link

https://cdprajapati.blogspot.com/search/label/blog?&max-results=8

Also, Join my Telegram channel with the below link

https://t.me/cdprajapati

Also, join my Whatsapp group with the below link

https://chat.whatsapp.com/CCqyfPnot932cVcORhc3Vj

Data Science: Data Collection and Data Pre-Processing

0 Comments

Contact Info

Contact List

Contact Form