Pandas DataFrame 分析

Pandas DataFrame 对象附带各种内置函数，例如 head()、tail() 和 info()，可用于查看和分析 DataFrame。

在 Pandas DataFrame 中查看数据

Pandas DataFrame 可以像其他 Python 变量一样使用 print() 函数显示。

但是，在处理具有大量行和列的非常大的 DataFrame 时，print() 函数无法显示整个 DataFrame。相反，它只打印 DataFrame 的一部分。

对于大型 DataFrame，我们可以使用 head()、tail() 和 info() 方法来获取 DataFrame 的概述。

Pandas head()

head() 方法提供 DataFrame 的快速摘要。它返回列标题和从开头开始的指定行数。例如，

import pandas as pd

# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike', 'Sarah', 'David', 'Linda', 'Tom', 'Emily'],
        'Age': [25, 30, 35, 28, 32, 27, 40, 33, 29, 31],
        'City': ['New York', 'Paris', 'London', 'Sydney', 'Tokyo', 'Berlin', 'Rome', 'Madrid', 'Toronto', 'Moscow']}
df = pd.DataFrame(data)

# display the first three rows
print('First Three Rows:')
print(df.head(3))
print()

# display the first five rows
print('First Five Rows:')
print(df.head())

输出

First Three Rows:
    Name   Age      City
0   John   25  New York
1  Alice   30     Paris
2    Bob   35    London

First Five Rows:
    Name   Age      City
0   John   25  New York
1  Alice   30     Paris
2    Bob   35    London
3   Emma   28    Sydney
4   Mike   32     Tokyo

在此示例中，我们使用 head() 显示了 df DataFrame 的选定行，从顶部开始。

请注意，当 head() 方法未传递任何参数时，默认会选择前五行。

Pandas tail()

tail() 方法与 head() 类似，但它返回从 DataFrame 末尾开始的数据。例如，

import pandas as pd

# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike', 'Sarah', 'David', 'Linda', 'Tom', 'Emily'],
        'Age': [25, 30, 35, 28, 32, 27, 40, 33, 29, 31],
        'City': ['New York', 'Paris', 'London', 'Sydney', 'Tokyo', 'Berlin', 'Rome', 'Madrid', 'Toronto', 'Moscow']}

df = pd.DataFrame(data)

# display the last three rows
print('Last Three Rows:')
print(df.tail(3))
print()

# display the last five rows
print('Last Five Rows:')
print(df.tail())

输出

Last Three Rows:
    Name   Age     City
7  Linda   33   Madrid
8    Tom   29  Toronto
9  Emily   31   Moscow

Last Five Rows:
    Name   Age     City
5  Sarah   27   Berlin
6  David   40     Rome
7  Linda   33   Madrid
8    Tom   29  Toronto
9  Emily   31   Moscow

在此示例中，我们使用 tail() 显示了 df DataFrame 的选定行，从底部开始。

请注意，当 tail() 方法未传递任何参数时，默认会选择最后五行。

获取 DataFrame 信息

info() 方法提供了有关 DataFrame 的整体信息，例如其类、数据类型、大小等。例如，

import pandas as pd

# create dataframe
data = {'Name': ['John', 'Alice', 'Bob', 'Emma', 'Mike', 'Sarah', 'David', 'Linda', 'Tom', 'Emily'],
        'Age': [25, 30, 35, 28, 32, 27, 40, 33, 29, 31],
        'City': ['New York', 'Paris', 'London', 'Sydney', 'Tokyo', 'Berlin', 'Rome', 'Madrid', 'Toronto', 'Moscow']}
df = pd.DataFrame(data)

# get info about dataframe
df.info()

输出

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    10 non-null     object
 1   Age     10 non-null     int64 
 2   City    10 non-null     object
dtypes: int64(1), object(2)
memory usage: 372.0+ bytes

如您所见，info() 方法提供了有关 Pandas DataFrame 的以下信息：

Class：对象的类，表示它是一个 pandas DataFrame
RangeIndex：DataFrame 的索引范围，显示起始和结束索引值
Data columns：DataFrame 中的总列数
Column names：DataFrame 中各列的名称
Non-Null Count：每列非空值的计数
Dtype：列的数据类型
Memory usage：DataFrame 的内存使用量（以字节为单位）

提供的信息使我们能够了解数据集，例如其结构、维度和缺失值。这些见解对于数据探索、清理、操作和分析至关重要。

热门教程

热门实例

参考资料

认证课程

成为一名认证的 Python
程序员。

热门教程

参考资料

热门实例

简介

DataFrame 操作和处理

数据导入和导出

数据清洗

数据分析和聚合

数据可视化

Pandas DataFrame 分析

在 Pandas DataFrame 中查看数据

Pandas head()

Pandas tail()

获取 DataFrame 信息

目录

热门教程

热门实例

参考资料

认证课程

成为一名认证的 Python程序员。

热门教程

参考资料

热门实例

简介

DataFrame 操作和处理

数据导入和导出

数据清洗

数据分析和聚合

数据可视化

Pandas DataFrame 分析

在 Pandas DataFrame 中查看数据

Pandas head()

Pandas tail()

获取 DataFrame 信息

目录

成为一名认证的 Python
程序员。