Flash cards
Review the key moves
What is the main idea behind Data Science - Python DataFrame?
Lesson checks
Practice each idea before moving on
Short Mimo-style checks built from this lesson's code, terms, and sequence.
Which statement best captures the main point of this lesson?
Complete the missing token from the example code.
___ pandas as pdPut the learning moves in the order that makes the concept easiest to apply.
Before charting or modeling a dataset, which move should come first?
Create a DataFrame with Pandas
A data frame is a structured representation of data.
Let's define a data frame with 3 columns and 5 rows with fictional numbers:
Example
import pandas as pd
d = {'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9,
5], 'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data=d)
print(df)Example Explained
- Import the Pandas library as pd
- Define data with column and rows in a variable named d
- Create a data frame using the function pd.DataFrame()
- The data frame contains 3 columns and 5 rows
- Print the data frame output with the print() function
We write pd. in front of DataFrame() to let Python know that we want to activate the DataFrame() function from the Pandas library.
Be aware of the capital D and F in DataFrame!
Interpreting the Output
This is the output
We see that "col1", "col2" and "col3" are the names of the columns.
Do not be confused about the vertical numbers ranging from 0-4. They tell us the information about the position of the rows.
In Python, the numbering of rows starts with zero.
Now, we can use Python to count the columns and rows.
We can use df.shape[1] to find the number of columns:
Example
import pandas as pd
df = pd.DataFrame({
"city": ["Vancouver", "Calgary", "Toronto"],
"visits": [1200, 860, 2100],
"signups": [156, 95, 252],
})
count_column = df.shape[1]
print(count_column)We can use df.shape[0] to find the number of rows:
Example
import pandas as pd
df = pd.DataFrame({
"city": ["Vancouver", "Calgary", "Toronto"],
"visits": [1200, 860, 2100],
"signups": [156, 95, 252],
})
count_row = df.shape[0]
print(count_row)Why Can We Not Just Count the Rows and Columns Ourselves?
If we work with larger data sets with many columns and rows, it will be confusing to count it by yourself. You risk to count it wrongly. If we use the built-in functions in Python correctly, we assure that the count is correct.