Skip to content

QST: Subject: User Experience Issue - NumPy Types in DataFrame Results Breaking Readability #61607

Open
@COderHop

Description

@COderHop

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

None

Question about pandas

ssue Description
TL;DR: Since pandas 2.0+, .tolist() and similar methods return NumPy types instead of native Python types, severely impacting user experience and data readability.
Problem Example
Before (pandas 1.x):
pythondf.index.tolist()

Returns: [0, 1, 2, 3, 4] # Clean, readable

Now (pandas 2.x):
pythondf.index.tolist()

Returns: [np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4)] # Verbose, confusing

Impact on User Experience

Poor Readability: Results are cluttered with np.int64(), np.float64() wrappers
Debugging Nightmare: Harder to quickly scan and understand data
Display Issues: When printing or logging, output is unnecessarily verbose
User Confusion: Many users don't understand why they're seeing NumPy types
Breaking Change: Existing code expectations broken without clear migration path

Current Workarounds Are Painful
Users now need to write additional code for basic operations:
python# Instead of simple:
indices = df.index.tolist()

We need:

indices = [int(x) for x in df.index.tolist()]
The Core Problem
DataFrames are meant for data analysis and exploration. The primary use case is human-readable data inspection, not performance-critical numerical computation at the .tolist() level.
Suggested Solutions

Add a parameter: .tolist(native_types=True) (default True for user-facing methods)
Separate methods: Keep .tolist() for NumPy types, add .tolist_clean() for Python types
Configuration option: Allow users to set pandas behavior globally
Revert the change: Prioritize user experience over marginal performance gains

Why This Matters
Pandas' strength has always been its ease of use and intuitive behavior. This change sacrifices user experience for performance gains that most users don't need when calling .tolist().
The goal of data analysis is insight, not fighting with data types.
Request
Please consider reverting this behavior or providing a simple, built-in solution. The current situation forces every pandas user to write boilerplate code for basic data inspection.
Thank you for maintaining this incredible library. I hope we can find a solution that balances performance with the user-friendly experience that makes pandas great.

Environment:

pandas: 2.2.3
numpy: 1.26.4
Impact: All DataFrame operations returning lists

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs TriageIssue that has not been reviewed by a pandas team memberUsage Question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions