Open
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html
Documentation problem
when you use df.apply
with raw=True
you can get an error if the applied function returns None for some elements, because of the way underlying numpy infers the array type from the first element.
Example:
import pandas as pd
from typing import Optional
def func(a: int) -> Optional[int]:
if a % 3 == 0: return 1
if a % 3 == 1: return 0
else: return None
df = pd.DataFrame([[1], [2], [3], [4], [5], [6]])
print(df.apply(lambda row: func(row[0]), axis=1, raw=True))
This will raise an error
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
On the other hand, if the first returned value is None, numpy creates an array of object
which can hold either int or None:
df = pd.DataFrame([2], [3], [4], [5], [6]])
print(df.apply(lambda row: func(row[0]), axis=1, raw=True))
will return
0 None
1 1
2 0
3 None
4 1
dtype: object
Suggested fix for documentation
Explain that the function must not return None if raw=True
or treat as a bug fix (i.e. allow specifying type of result ndarray explicitly)