Skip to content

Commit 320316e

Browse files
[3.13] gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130127)
We had the definition of what makes a character "printable" documented in three places, giving two different definitions. The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that. With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics. Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place. Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs. Also add a thorough test that the implementation agrees with this definition. Author: Greg Price <[email protected]> Co-authored-by: Greg Price <[email protected]> (cherry picked from commit 3402e13)
1 parent fefd2c5 commit 320316e

File tree

6 files changed

+34
-34
lines changed

6 files changed

+34
-34
lines changed

Doc/c-api/unicode.rst

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -256,13 +256,8 @@ the Python configuration.
256256
257257
.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UCS4 ch)
258258
259-
Return ``1`` or ``0`` depending on whether *ch* is a printable character.
260-
Nonprintable characters are those characters defined in the Unicode character
261-
database as "Other" or "Separator", excepting the ASCII space (0x20) which is
262-
considered printable. (Note that printable characters in this context are
263-
those which should not be escaped when :func:`repr` is invoked on a string.
264-
It has no bearing on the handling of strings written to :data:`sys.stdout` or
265-
:data:`sys.stderr`.)
259+
Return ``1`` or ``0`` depending on whether *ch* is a printable character,
260+
in the sense of :meth:`str.isprintable`.
266261
267262
268263
These APIs can be used for fast direct character conversions:

Doc/library/stdtypes.rst

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1876,13 +1876,19 @@ expression support in the :mod:`re` module).
18761876

18771877
.. method:: str.isprintable()
18781878

1879-
Return ``True`` if all characters in the string are printable or the string is
1880-
empty, ``False`` otherwise. Nonprintable characters are those characters defined
1881-
in the Unicode character database as "Other" or "Separator", excepting the
1882-
ASCII space (0x20) which is considered printable. (Note that printable
1883-
characters in this context are those which should not be escaped when
1884-
:func:`repr` is invoked on a string. It has no bearing on the handling of
1885-
strings written to :data:`sys.stdout` or :data:`sys.stderr`.)
1879+
Return true if all characters in the string are printable, false if it
1880+
contains at least one non-printable character.
1881+
1882+
Here "printable" means the character is suitable for :func:`repr` to use in
1883+
its output; "non-printable" means that :func:`repr` on built-in types will
1884+
hex-escape the character. It has no bearing on the handling of strings
1885+
written to :data:`sys.stdout` or :data:`sys.stderr`.
1886+
1887+
The printable characters are those which in the Unicode character database
1888+
(see :mod:`unicodedata`) have a general category in group Letter, Mark,
1889+
Number, Punctuation, or Symbol (L, M, N, P, or S); plus the ASCII space 0x20.
1890+
Nonprintable characters are those in group Separator or Other (Z or C),
1891+
except the ASCII space.
18861892

18871893

18881894
.. method:: str.isspace()

Lib/test/test_str.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -853,6 +853,15 @@ def test_isprintable(self):
853853
self.assertTrue('\U0001F46F'.isprintable())
854854
self.assertFalse('\U000E0020'.isprintable())
855855

856+
@support.requires_resource('cpu')
857+
def test_isprintable_invariant(self):
858+
for codepoint in range(sys.maxunicode + 1):
859+
char = chr(codepoint)
860+
category = unicodedata.category(char)
861+
self.assertEqual(char.isprintable(),
862+
category[0] not in ('C', 'Z')
863+
or char == ' ')
864+
856865
def test_surrogates(self):
857866
for s in ('a\uD800b\uDFFF', 'a\uDFFFb\uD800',
858867
'a\uD800b\uDFFFa', 'a\uDFFFb\uD800a'):

Objects/clinic/unicodeobject.c.h

Lines changed: 3 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Objects/unicodectype.c

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -142,18 +142,10 @@ int _PyUnicode_IsNumeric(Py_UCS4 ch)
142142
return (ctype->flags & NUMERIC_MASK) != 0;
143143
}
144144

145-
/* Returns 1 for Unicode characters to be hex-escaped when repr()ed,
146-
0 otherwise.
147-
All characters except those characters defined in the Unicode character
148-
database as following categories are considered printable.
149-
* Cc (Other, Control)
150-
* Cf (Other, Format)
151-
* Cs (Other, Surrogate)
152-
* Co (Other, Private Use)
153-
* Cn (Other, Not Assigned)
154-
* Zl Separator, Line ('\u2028', LINE SEPARATOR)
155-
* Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
156-
* Zs (Separator, Space) other than ASCII space('\x20').
145+
/* Returns 1 for Unicode characters that repr() may use in its output,
146+
and 0 for characters to be hex-escaped.
147+
148+
See documentation of `str.isprintable` for details.
157149
*/
158150
int _PyUnicode_IsPrintable(Py_UCS4 ch)
159151
{

Objects/unicodeobject.c

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12016,15 +12016,14 @@ unicode_isidentifier_impl(PyObject *self)
1201612016
/*[clinic input]
1201712017
str.isprintable as unicode_isprintable
1201812018
12019-
Return True if the string is printable, False otherwise.
12019+
Return True if all characters in the string are printable, False otherwise.
1202012020
12021-
A string is printable if all of its characters are considered printable in
12022-
repr() or if it is empty.
12021+
A character is printable if repr() may use it in its output.
1202312022
[clinic start generated code]*/
1202412023

1202512024
static PyObject *
1202612025
unicode_isprintable_impl(PyObject *self)
12027-
/*[clinic end generated code: output=3ab9626cd32dd1a0 input=98a0e1c2c1813209]*/
12026+
/*[clinic end generated code: output=3ab9626cd32dd1a0 input=4e56bcc6b06ca18c]*/
1202812027
{
1202912028
Py_ssize_t i, length;
1203012029
int kind;

0 commit comments

Comments
 (0)