## Why should you take Data Science?

- The number of Data Science and Analytics job listings is projected to grow by nearly

364,000 listings by 2020 – Forbes - The average salary for a Data Scientist is $120k as per Glassdoor
- Businesses analysing data will see $430 billion in productivity benefits over their rivals not analysing data by 2020

**What is Data Science?**

Data science is a deep study of the massive amount of data, which involves extracting meaningful insights from raw, structured, and unstructured data that is processed using the scientific method, different technologies, and algorithms.

It is a multidisciplinary field that uses tools and techniques to manipulate the data so that you can find something new and meaningful.

Data science uses the most powerful hardware, programming systems, and most efficient algorithms to solve the data related problems. It is the future of artificial intelligence.

In short, we can say that data science is all about:

- Asking the correct questions and analyzing the raw data.
- Modeling the data using various complex and efficient algorithms.
- Visualizing the data to get a better perspective.
- Understanding the data to make better decisions and finding the final result.

**Example:**

Let suppose we want to travel from station A to station B by car. Now, we need to take some decisions such as which route will be the best route to reach faster at the location, in which route there will be no traffic jam, and which will be cost-effective. All these decision factors will act as input data, and we will get an appropriate answer from these decisions, so this analysis of data is called the data analysis, which is a part of data science.

**Need for Data Science:**

Some years ago, data was less and mostly available in a structured form, which could be easily stored in excel sheets, and processed using BI tools.

But in today’s world, data is becoming so vast, i.e., approximately **2.5 quintals bytes** of data is generating on every day, which led to data explosion. It is estimated as per researches, that by 2020, 1.7 MB of data will be created at every single second, by a single person on earth. Every Company requires data to work, grow, and improve their businesses.

Now, handling of such huge amount of data is a challenging task for every organization. So to handle, process, and analysis of this, we required some complex, powerful, and efficient algorithms and technology, and that technology came into existence as data Science. Following are some main reasons for using data science technology:

- With the help of data science technology, we can convert the massive amount of raw and unstructured data into meaningful insights.
- Data science technology is opting by various companies, whether it is a big brand or a startup. Google, Amazon, Netflix, etc, which handle the huge amount of data, are using data science algorithms for better customer experience.
- Data science is working for automating transportation such as creating a self-driving car, which is the future of transportation.
- Data science can help in different predictions such as various survey, elections, flight ticket confirmation, etc.

## Advanced predictive modelling in R

**Data science Jobs:**

As per various surveys, data scientist job is becoming the most demanding Job of the 21st century due to increasing demands for data science. Some people also called it “the **hottest job title of the 21st century**“. Data scientists are the experts who can use various statistical tools and machine learning algorithms to understand and analyze the data.

The average salary range for data scientist will be approximately **$95,000 to $ 165,000 per annum**, and as per different researches, about **11.5 millions** of job will be created by the year **2026**.

**Types of Data Science Job**

If you learn data science, then you get the opportunity to find the various exciting job roles in this domain. The main job roles are given below:

- Data Scientist
- Data Analyst
- Machine learning expert
- Data engineer
- Data Architect
- Data Administrator
- Business Analyst
- Business Intelligence Manager

Below is the explanation of some critical job titles of data science.

**1. Data Analyst:**

Data analyst is an individual, who performs mining of huge amount of data, models the data, looks for patterns, relationship, trends, and so on. At the end of the day, he comes up with visualization and reporting for analyzing the data for decision making and problem-solving process.

**Skill required:** For becoming a data analyst, you must get a good background in **mathematics, business intelligence, data mining**, and basic knowledge of **statistics**. You should also be familiar with some computer languages and tools such as **MATLAB, Python, SQL, Hive, Pig, Excel, SAS, R, JS, Spark**, etc.

**2. Machine Learning Expert:**

The machine learning expert is the one who works with various machine learning algorithms used in data science such as **regression, clustering, classification, decision tree, random forest**, etc.

**Skill Required:** Computer programming languages such as Python, C++, R, Java, and Hadoop. You should also have an understanding of various algorithms, problem-solving analytical skill, probability, and statistics.

**3. Data Engineer:**

A data engineer works with massive amount of data and responsible for building and maintaining the data architecture of a data science project. Data engineer also works for the creation of data set processes used in modeling, mining, acquisition, and verification.

**Skill required:** Data engineer must have depth knowledge of **SQL, MongoDB, Cassandra, HBase, Apache Spark, Hive, MapReduce**, with language knowledge of **Python, C/C++, Java, Perl**, etc.

**4. Data Scientist: **

A data scientist is a professional who works with an enormous amount of data to come up with compelling business insights through the deployment of various tools, techniques, methodologies, algorithms, etc.

**Skill required:** To become a data scientist, one should have technical language skills such as **R, SAS, SQL, Python, Hive, Pig, Apache spark, MATLAB**. Data scientists must have an understanding of Statistics, Mathematics, visualization, and communication skills.

## Decision tree modelling in R

**Prerequisite for Data Science**

**Non-Technical Prerequisite:**

**Curiosity:**To learn data science, one must have curiosities. When you have curiosity and ask various questions, then you can understand the business problem easily.**Critical Thinking:**It is also required for a data scientist so that you can find multiple new ways to solve the problem with efficiency.**Communication skills:**Communication skills are most important for a data scientist because after solving a business problem, you need to communicate it with the team.

**Technical Prerequisite:**

**Machine learning:**To understand data science, one needs to understand the concept of machine learning. Data science uses machine learning algorithms to solve various problems.**Mathematical modeling:**Mathematical modeling is required to make fast mathematical calculations and predictions from the available data.**Statistics:**Basic understanding of statistics is required, such as mean, median, or standard deviation. It is needed to extract knowledge and obtain better results from the data.**Computer programming:**For data science, knowledge of at least one programming language is required. R, Python, Spark are some required computer programming languages for data science.**Databases:**The depth understanding of Databases such as SQL, is essential for data science to get the data and to work with data.

## Edureka's Data Science Training

Edureka’s Data Science Training lets you gain expertise in Machine Learning Algorithms like K-Means Clustering, Decision Trees, Random Forest, and Naive Bayes using R.

Data Science Training encompasses a conceptual understanding of Statistics, Time Series, Text Mining and an introduction to Deep Learning.

**Difference between BI and Data Science**

BI stands for business intelligence, which is also used for data analysis of business information: Below are some differences between BI and Data sciences:

Criterion | Business intelligence | Data science |
---|---|---|

Data Source | Business intelligence deals with structured data, e.g., data warehouse. | Data science deals with structured and unstructured data, e.g., weblogs, feedback, etc. |

Method | Analytical(historical data) | Scientific(goes deeper to know the reason for the data report) |

Skills | Statistics and Visualization are the two skills required for business intelligence. | Statistics, Visualization, and Machine learning are the required skills for data science. |

Focus | Business intelligence focuses on both Past and present data | Data science focuses on past data, present data, and also future predictions. |

**Data Science Components:**

The main components of Data Science are given below:

**1. Statistics:** Statistics is one of the most important components of data science. Statistics is a way to collect and analyze the numerical data in a large amount and finding meaningful insights from it.

**2. Domain Expertise:** In data science, domain expertise binds data science together. Domain expertise means specialized knowledge or skills of a particular area. In data science, there are various areas for which we need domain experts.

**3. Data engineering:** Data engineering is a part of data science, which involves acquiring, storing, retrieving, and transforming the data. Data engineering also includes metadata (data about data) to the data.

**4. Visualization:** Data visualization is meant by representing data in a visual context so that people can easily understand the significance of data. Data visualization makes it easy to access the huge amount of data in visuals.

**5. Advanced computing:** Heavy lifting of data science is advanced computing. Advanced computing involves designing, writing, debugging, and maintaining the source code of computer programs.

**6. Mathematics:** Mathematics is the critical part of data science. Mathematics involves the study of quantity, structure, space, and changes. For a data scientist, knowledge of good mathematics is essential.

**7. Machine learning:** Machine learning is backbone of data science. Machine learning is all about to provide training to a machine so that it can act as a human brain. In data science, we use various machine learning algorithms to solve the problems.

**Tools for Data Science**

Following are some tools required for data science:

**Data Analysis tools:**R, Python, Statistics, SAS, Jupyter, R Studio, MATLAB, Excel, RapidMiner.**Data Warehousing:**ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift**Data Visualization tools:**R, Jupyter, Tableau, Cognos.**Machine learning tools:**Spark, Mahout, Azure ML studio.

**Machine learning in Data Science**

To become a data scientist, one should also be aware of machine learning and its algorithms, as in data science, there are various machine learning algorithms which are broadly being used. Following are the name of some machine learning algorithms used in data science:

- Regression
- Decision tree
- Clustering
- Principal component analysis
- Support vector machines
- Naive Bayes
- Artificial neural network
- Apriori

**Applications of Data Science:**

**Image recognition and speech recognition:**

Data science is currently using for Image and speech recognition. When you upload an image on Facebook and start getting the suggestion to tag to your friends. This automatic tagging suggestion uses image recognition algorithm, which is part of data science.

When you say something using, “Ok Google, Siri, Cortana”, etc., and these devices respond as per voice control, so this is possible with speech recognition algorithm.**Gaming world:**

In the gaming world, the use of Machine learning algorithms is increasing day by day. EA Sports, Sony, Nintendo, are widely using data science for enhancing user experience.**Internet search:**

When we want to search for something on the internet, then we use different types of search engines such as Google, Yahoo, Bing, Ask, etc. All these search engines use the data science technology to make the search experience better, and you can get a search result with a fraction of seconds.**Transport:**

Transport industries also using data science technology to create self-driving cars. With self-driving cars, it will be easy to reduce the number of road accidents.**Healthcare:**

In the healthcare sector, data science is providing lots of benefits. Data science is being used for tumor detection, drug discovery, medical image analysis, virtual medical bots, etc.**Recommendation systems:**

Most of the companies, such as Amazon, Netflix, Google Play, etc., are using data science technology for making a better user experience with personalized recommendations. Such as, when you search for something on Amazon, and you started getting suggestions for similar products, so this is because of data science technology.**Risk detection:**

Finance industries always had an issue of fraud and risk of losses, but with the help of data science, this can be rescued.

Most of the finance companies are looking for the data scientist to avoid risk and any type of losses with an increase in customer satisfaction.