Data and Big Data
• Data is a set of qualitative or quantitative – it can be structured or unstructured, machine readable or not, digital or analogue, personal or not.

• Ultimately it is a specific set or sets of individual data points, which can be used to generate insights, be combined and abstracted to create information, knowledge and wisdom.

• Traditional analysis tools and software can be used to analyse and “crunch” data.

• The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs.

• big data is larger, more complex data sets, especially from new data sources.

• These data sets are so voluminous that traditional data processing software just can’t manage them.

• But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.

Introduction to Data Science

The three Vs of big data

Volume: The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data.

• This can be data of unknown value, such as Twitter data feeds, clickstreams on a web page or a mobile app, or sensor-enabled equipment.

• For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.

• Velocity: Velocity is the fast rate at which data is received

• Variety: Variety refers to the many types of data that are available.

• Traditional data types were structured and fit neatly in The three Vs of big data
a relational database. With the rise of big data, data comes in new unstructured data types.

• Unstructured and semi structured data types, such as text, audio, and video, require additional preprocessig to derive meaning and support metadata.

• There are “dimensions” that distinguish data from BIG DATA, summarised as the “3 Vs” of data: Volume, Variety, Velocity.

• Hence, BIG DATA, is not just “more” data. It is so The three Vs of big data
much data, that is so mixed and unstructured, and is accumulating so rapidly, that traditional techniques and methodologies including “normal” software do not really work (like Excel, Crystal reports or similar).

Examples Of Big Data
The New York Stock Exchange is an example of Big Data that generates about one terabyte of new trade data per day.

Social Media
• The statistic shows that 500+ terabytes of new data get ingested into the databases of social media site Facebook, every day.

• This data is mainly generated in terms of photo and video uploads, message exchanges, putting comment.
------------------------------------------------------------------------------------------------------------

What is Data Science?

• Data science is a deep study of the massive amount of data, which involves extracting meaningful insights from raw, structured, and unstructured data that is processed using the scientific method, different technologies, and algorithms.

• It is a multidisciplinary field that uses tools and techniques to manipulate the data so that you can find something new and meaningful.

• Data is the oil for today’s world. With the right tools, technologies, algorithms, we can use data and convert it into a distinctive business advantage.

• Data Science can help you to detect fraud using advanced machine learning algorithms. It helps you to prevent any significant monetary losses

Data Science Process

• Asking the correct questions and analyzing the raw data.

• Modelling the data using various complex and efficient algorithms.

• Visualizing the data to get a better perspective.

• Understanding the data to make better decisions and finding the final result.

Data Science Process

Types of Data Science Job

• Data Scientist
• Data Analyst
• Machine learning expert
• Data engineer
• Data Architect
• Data Administrator
• Business Analyst
• Business Intelligence Manager

Data Science Components

Data Science Components


Tools for Data Science

Data Analysis tools: R, Python, Statistics, SAS, Jupyter, R Studio, MATLAB, Excel, RapidMiner.

Data Warehousing tools: ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift
Tools for Data Science

Data Visualization tools: R, Jupyter, Tableau, Cognos.

Machine learning tools: Spark, Mahout, Azure ML studio.


Download link for more Detail:  Introduction to Data Science


Also, Join my Telegram channel with the below link

Also, join my Whatsapp group with the below link