Series.factorize(sort=False, na_sentinel=-1)
[source]
Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values. factorize
is available as both a top-level function pandas.factorize()
, and as a method Series.factorize()
and Index.factorize()
.
Parameters: |
sort : boolean, default False Sort na_sentinel : int, default -1 Value to mark “not found”. |
---|---|
Returns: |
labels : ndarray An integer ndarray that’s an indexer into uniques : ndarray, Index, or Categorical The unique valid values. When Note Even if there’s a missing value in |
See also
pandas.cut
pandas.unique
These examples all show factorize as a top-level method like pd.factorize(values)
. The results are identical for methods like Series.factorize()
.
>>> labels, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b']) >>> labels array([0, 0, 1, 2, 0]) >>> uniques array(['b', 'a', 'c'], dtype=object)
With sort=True
, the uniques
will be sorted, and labels
will be shuffled so that the relationship is the maintained.
>>> labels, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True) >>> labels array([1, 1, 0, 2, 1]) >>> uniques array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in labels
with na_sentinel
(-1
by default). Note that missing values are never included in uniques
.
>>> labels, uniques = pd.factorize(['b', None, 'a', 'c', 'b']) >>> labels array([ 0, -1, 1, 2, 0]) >>> uniques array(['b', 'a', 'c'], dtype=object)
Thus far, we’ve only factorized lists (which are internally coerced to NumPy arrays). When factorizing pandas objects, the type of uniques
will differ. For Categoricals, a Categorical
is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c']) >>> labels, uniques = pd.factorize(cat) >>> labels array([0, 0, 1]) >>> uniques [a, c] Categories (3, object): [a, b, c]
Notice that 'b'
is in uniques.categories
, desipite not being present in cat.values
.
For all other pandas objects, an Index of the appropriate type is returned.
>>> cat = pd.Series(['a', 'a', 'c']) >>> labels, uniques = pd.factorize(cat) >>> labels array([0, 0, 1]) >>> uniques Index(['a', 'c'], dtype='object')
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
http://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Series.factorize.html