Skip to content

DOC: warn about apply with raw=True, if function returns Optional[int] #61632

Open
@wrschneider

Description

@wrschneider

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

Documentation problem

when you use df.apply with raw=True you can get an error if the applied function returns None for some elements, because of the way underlying numpy infers the array type from the first element.

Example:

import pandas as pd
from typing import Optional

def func(a: int) -> Optional[int]:
  if a % 3 == 0: return 1
  if a % 3 == 1: return 0
  else: return None 

df = pd.DataFrame([[1], [2], [3], [4], [5], [6]])

print(df.apply(lambda row: func(row[0]), axis=1, raw=True))

This will raise an error

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

On the other hand, if the first returned value is None, numpy creates an array of object which can hold either int or None:

df = pd.DataFrame([2], [3], [4], [5], [6]])
print(df.apply(lambda row: func(row[0]), axis=1, raw=True))

will return

0    None
1       1
2       0
3    None
4       1
dtype: object

Suggested fix for documentation

Explain that the function must not return None if raw=True

or treat as a bug fix (i.e. allow specifying type of result ndarray explicitly)

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions