Quantcast
Channel: Robin on Linux
Viewing all articles
Browse latest Browse all 236

Be careful when you use “isin()” method in Pandas

$
0
0
import pandas as pd

df_excl = pd.DataFrame({"id": ["12345"]})
df = pd.DataFrame({"id": ["12345", "67890"]})

result = df[~df.id.isin(df_excl[["id"]])]
print(result)

Guess what’s the result of above snippet? Just a dataframe with “67890”? No, the result is

      id
0  12345
1  67890

Why the “12345” has not been excluded? The reason is quite tricky: df_excl[["id"]] is a DataFrame but what we need in isin() is Series! So we shouldn’t use [[]] here, but []

The correct code should use df_excl["id"], as below:

...
result = df[~df.id.isin(df_excl["id"])] 
print(result)


Viewing all articles
Browse latest Browse all 236

Trending Articles