Constructor
new GroupedData()
Note: Do not use directly (see above).
- Since:
- 1.3.0
- Source:
Methods
agg(expr)
Compute aggregates by specifying a series of aggregate columns. Note that this function by
default retains the grouping columns in its output. To not retain grouping columns, set
spark.sql.retainGroupColumns
to false.
The available aggregate functions are defined in Functions.
Parameters:
Name | Type | Description |
---|---|---|
expr |
Array of columns to group by. |
- Since:
- 1.3.0
- Source:
Example
// Select the age of the oldest employee and the aggregate expense for each department
df.groupBy("department").agg(F.max("age"), F.sum("expense"));
avg(colNames)
Compute the average value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the average values for them.
Parameters:
Name | Type | Description |
---|---|---|
colNames |
Array of columns to compute mean over. |
- Since:
- 1.3.0
- Source:
count()
Count the number of rows for each group. The resulting DataFrame will also contain the grouping columns.
- Since:
- 1.3.0
- Source:
max(colNames)
Compute the max value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the max values for them.
Parameters:
Name | Type | Description |
---|---|---|
colNames |
Array of columns to compute max over. |
- Since:
- 1.3.0
- Source:
mean(colNames)
Alias for GroupedData#avg.
Parameters:
Name | Type | Description |
---|---|---|
colNames |
Array of columns to compute mean over. |
- Since:
- 1.3.0
- Source:
min(colNames)
Compute the min value for each numeric column for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the min values for them.
Parameters:
Name | Type | Description |
---|---|---|
colNames |
Array of columns to compute min over. |
- Since:
- 1.3.0
- Source:
sum(colNames)
Compute the sum for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the sum for them.
Parameters:
Name | Type | Description |
---|---|---|
colNames |
Array of columns to compute sum over. |
- Since:
- 1.3.0
- Source: