bugl
bugl
HomeLearnPatternsPathsSearch
HomeLearnPatternsPathsSearch

Loading lesson path

Learn/Data Science/DS Statistics
Data Science•DS Statistics

Data Science - Statistics Standard Deviation

Flash cards

Review the key moves

1/4
Core idea

What is the main idea behind Data Science - Statistics Standard Deviation?

Lesson checks

Practice each idea before moving on

Short Mimo-style checks built from this lesson's code, terms, and sequence.

1Quick choice

Which statement best captures the main point of this lesson?

2Fill blank

Complete the missing token from the example code.

___ numpy as np
3Order

Put the learning moves in the order that makes the concept easiest to apply.

Standard deviation is a number that describes how spread out the observations are.
Coefficient of Variation
Standard Deviation
4Data move

Before charting or modeling a dataset, which move should come first?

Standard Deviation

Standard deviation is a number that describes how spread out the observations are.

A mathematical function will have difficulties in predicting precise values, if the observations are "spread". Standard deviation is a measure of uncertainty.

A low standard deviation means that most of the numbers are close to the mean (average) value.

A high standard deviation means that the values are spread out over a wider range.

Tip

Standard Deviation is often represented by the symbol Sigma: σ

We can use the std() function from Numpy to find the standard deviation of a variable:

Example

import numpy as np
std = np.std(full_health_data)
print(std)

The output

What does these numbers mean?

Coefficient of Variation

The coefficient of variation is used to get an idea of how large the standard deviation is.

Mathematically, the coefficient of variation is defined as:

Coefficient of Variation = Standard Deviation / Mean

We can do this in Python if we proceed with the following code:

Example

import numpy as np
cv = np.std(full_health_data) / np.mean(full_health_data)

print(cv)

The output

We see that the variables Duration, Calorie_Burnage and Hours_Work has a high Standard Deviation compared to Max_Pulse, Average_Pulse and Hours_Sleep.

Previous

Data Science - Statistics Percentiles

Next

Data Science - Statistics Variance