Quantcast
Channel: Robin on Linux
Viewing all articles
Browse latest Browse all 236

To solve the problem about pivot() of Pandas

$
0
0

Below is an example from pandas official document for pivot():

import pandas as pd

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result = df.pivot(index='foo', columns='bar', values='baz')
print(result)

the result will be

bar  A  B  C
foo         
one  1  2  3
two  4  5  6

But if there are duplicate rows in the Dataframe, it will report error:

ValueError: Index contains duplicate entries, cannot reshape

To fix this, we can just drop_duplicates() first and then pivot():

result = df.drop_duplicates().pivot(index='foo', columns='bar', values='baz')

As matter of fact, there are situations that drop_duplicates() couldn’t fix:

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
                   'bar': ['A', 'A', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

Now we will need to use groupby() and unstack() to replace pivot():

result = (df.groupby(["foo", "bar"])
    .baz
    .first()
    .unstack()
)

And the result is

bar    A    B    C
foo               
one  1.0  NaN  3.0
two  4.0  5.0  6.0

The post To solve the problem about pivot() of Pandas first appeared on Robin on Linux.

Viewing all articles
Browse latest Browse all 236

Trending Articles