GroupBy.apply(func, *args, **kwargs) [source]
Apply function func group-wise and combine the results together.
The function passed to apply must take a dataframe as its first argument and return a dataframe, a series or a scalar. apply will then take care of combining the results back together into a single dataframe or series. apply is therefore a highly flexible grouping method.
While apply is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods. Pandas offers a wide range of method that will be much faster than using apply for their specific purposes, so try to use them before reaching for apply.
| Parameters: |
func : function A callable that takes a dataframe as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments args, kwargs : tuple and dict Optional positional and keyword arguments to pass to |
|---|---|
| Returns: |
|
See also
pipe
In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.
>>> df = pd.DataFrame({'A': 'a a b'.split(), 'B': [1,2,3], 'C': [4,6, 5]})
>>> g = df.groupby('A')
From df above we can see that g has two groups, a, b. Calling apply in various ways, we can get different grouping results:
Example 1: below the function passed to apply takes a dataframe as its argument and returns a dataframe. apply combines the result for each group together into a new dataframe:
>>> g.apply(lambda x: x / x.sum())
B C
0 0.333333 0.4
1 0.666667 0.6
2 1.000000 1.0
Example 2: The function passed to apply takes a dataframe as its argument and returns a series. apply combines the result for each group together into a new dataframe:
>>> g.apply(lambda x: x.max() - x.min()) B C A a 1 2 b 0 0
Example 3: The function passed to apply takes a dataframe as its argument and returns a scalar. apply combines the result for each group together into a series, including setting the index as appropriate:
>>> g.apply(lambda x: x.C.max() - x.B.min()) A a 5 b 2 dtype: int64
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
http://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.core.groupby.GroupBy.apply.html