Series.str.split(pat=None, n=-1, expand=False) [source]
Split strings around given separator/delimiter.
Split each string in the caller’s values by given pattern, propagating NaN values. Equivalent to str.split().
| Parameters: |
pat : str, optional String or regular expression to split on. If not specified, split on whitespace. n : int, default -1 (all) Limit number of splits in output. expand : bool, default False Expand the splitted strings into separate columns.
|
|---|---|
| Returns: |
Series, Index, DataFrame or MultiIndex Type matches caller unless |
See also
str.split
Series.str.get_dummies
Series.str.partition
The handling of the n keyword depends on the number of found splits:
n, make first n splits onlyn, make all splitsn, append None for padding up to n if expand=True
If using expand=True, Series and Index callers return DataFrame and MultiIndex objects, respectively.
>>> s = pd.Series(["this is good text", "but this is even better"])
By default, split will return an object of the same size having lists containing the split elements
>>> s.str.split()
0 [this, is, good, text]
1 [but, this, is, even, better]
dtype: object
>>> s.str.split("random")
0 [this is good text]
1 [but this is even better]
dtype: object
When using expand=True, the split elements will expand out into separate columns.
For Series object, output return type is DataFrame.
>>> s.str.split(expand=True)
0 1 2 3 4
0 this is good text None
1 but this is even better
>>> s.str.split(" is ", expand=True)
0 1
0 this good text
1 but this even better
For Index object, output return type is MultiIndex.
>>> i = pd.Index(["ba 100 001", "ba 101 002", "ba 102 003"])
>>> i.str.split(expand=True)
MultiIndex(levels=[['ba'], ['100', '101', '102'], ['001', '002', '003']],
labels=[[0, 0, 0], [0, 1, 2], [0, 1, 2]])
Parameter n can be used to limit the number of splits in the output.
>>> s.str.split("is", n=1)
0 [th, is good text]
1 [but th, is even better]
dtype: object
>>> s.str.split("is", n=1, expand=True)
0 1
0 th is good text
1 but th is even better
If NaN is present, it is propagated throughout the columns during the split.
>>> s = pd.Series(["this is good text", "but this is even better", np.nan])
>>> s.str.split(n=3, expand=True)
0 1 2 3
0 this is good text
1 but this is even better
2 NaN NaN NaN NaN
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
http://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Series.str.split.html