You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+9-139
Original file line number
Diff line number
Diff line change
@@ -124,11 +124,6 @@ Multithreading
124
124
125
125
The regex module releases the GIL during matching on instances of the built-in (immutable) string classes, enabling other Python threads to run concurrently. It is also possible to force the regex module to release the GIL during matching by calling the matching methods with the keyword argument ``concurrent=True``. The behaviour is undefined if the string changes during matching, so use it *only* when it is guaranteed that that won't happen.
126
126
127
-
Building for 64-bits
128
-
--------------------
129
-
130
-
If the source files are built for a 64-bit target then the string positions will also be 64-bit.
131
-
132
127
Unicode
133
128
-------
134
129
@@ -141,8 +136,6 @@ Additional features
141
136
142
137
The issue numbers relate to the Python bug tracker, except where listed as "Hg issue".
143
138
144
-
* Fixed support for pickling compiled regexes (`Hg issue 195 <https://bitbucket.org/mrabarnett/mrab-regex/issues/195>`_)
145
-
146
139
* Added support for lookaround in conditional pattern (`Hg issue 163 <https://bitbucket.org/mrabarnett/mrab-regex/issues/163>`_)
147
140
148
141
The test of a conditional pattern can now be a lookaround.
@@ -339,20 +332,6 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
339
332
>>> regex.sub('(?V1).*?', '|', 'test')
340
333
'|||||||||'
341
334
342
-
* re.group() should never return a bytearray (`issue #18468 <https://bugs.python.org/issue18468>`_)
343
-
344
-
For compatibility with the re module, the regex module returns all matching bytestrings as ``bytes``, starting from Python 3.4.
``capturesdict`` is a combination of ``groupdict`` and ``captures``:
@@ -480,7 +459,7 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
480
459
481
460
* Detach searched string
482
461
483
-
A match object contains a reference to the string that was searched, via its ``string`` attribute. The match object now has a ``detach_string`` method that will 'detach' that string, making it available for garbage collection (this might save valuable memory if that string is very large).
462
+
A match object contains a reference to the string that was searched, via its ``string`` attribute. The ``detach_string`` method will 'detach' that string, making it available for garbage collection, which might save valuable memory if that string is very large.
484
463
485
464
Example:
486
465
@@ -497,10 +476,6 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
497
476
>>> print(m.string)
498
477
None
499
478
500
-
* Characters in a group name (`issue #14462 <https://bugs.python.org/issue14462>`_)
501
-
502
-
A group name can now contain the same characters as an identifier. These are different in Python 2 and Python 3.
@@ -526,37 +501,10 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
526
501
527
502
It's possible to backtrack into a recursed or repeated group.
528
503
529
-
You can't call a group if there is more than one group with that group name or group number (``"ambiguous group reference"``). For example, ``(?P<foo>\w+) (?P<foo>\w+) (?&foo)?`` has 2 groups called "foo" (both group 1) and ``(?|([A-Z]+)|([0-9]+)) (?1)?`` has 2 groups with group number 1.
504
+
You can't call a group if there is more than one group with that group name or group number (``"ambiguous group reference"``).
530
505
531
506
The alternative forms ``(?P>name)`` and ``(?P&name)`` are also supported.
532
507
533
-
* repr(regex) doesn't include actual regex (`issue #13592 <https://bugs.python.org/issue13592>`_)
534
-
535
-
The repr of a compiled regex is now in the form of a eval-able string. For example:
536
-
537
-
.. sourcecode:: python
538
-
539
-
>>> r = regex.compile("foo", regex.I)
540
-
>>> repr(r)
541
-
"regex.Regex('foo', flags=regex.I | regex.V0)"
542
-
>>> r
543
-
regex.Regex('foo', flags=regex.I | regex.V0)
544
-
545
-
The regex module has Regex as an alias for the 'compile' function.
546
-
547
-
* Improve the repr for regular expression match objects (`issue #17087 <https://bugs.python.org/issue17087>`_)
548
-
549
-
The repr of a match object is now a more useful form. For example:
550
-
551
-
.. sourcecode:: python
552
-
553
-
>>> regex.search(r"\d+", "abc012def")
554
-
<regex.Match object; span=(3, 6), match='012'>
555
-
556
-
* Python lib re cannot handle Unicode properly due to narrow/wide bug (`issue #12729 <https://bugs.python.org/issue12729>`_)
557
-
558
-
The source code of the regex module has been updated to support PEP 393 ("Flexible String Representation"), which is new in Python 3.3.
559
-
560
508
* Full Unicode case-folding is supported.
561
509
562
510
In version 1 behaviour, the regex module uses full case-folding when performing case-insensitive matches in Unicode.
@@ -608,16 +556,10 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
608
556
609
557
In the following examples I'll omit the item and write only the fuzziness:
610
558
611
-
* ``{i<=3}`` permit at most 3 insertions, but no other types
612
-
613
559
* ``{d<=3}`` permit at most 3 deletions, but no other types
614
560
615
-
* ``{s<=3}`` permit at most 3 substitutions, but no other types
616
-
617
561
* ``{i<=1,s<=2}`` permit at most 1 insertion and at most 2 substitutions, but no deletions
618
562
619
-
* ``{e<=3}`` permit at most 3 errors
620
-
621
563
* ``{1<=e<=3}`` permit at least 1 and at most 3 errors
622
564
623
565
* ``{i<=2,d<=2,e<=3}`` permit at most 2 insertions, at most 2 deletions, at most 3 errors in total, but no substitutions
@@ -630,25 +572,19 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
630
572
631
573
* ``{i<=1,d<=1,s<=1,2i+2d+1s<=4}`` at most 1 insertion, at most 1 deletion, at most 1 substitution; each insertion costs 2, each deletion costs 2, each substitution costs 1, the total cost must not exceed 4
632
574
633
-
You can also use "<" instead of "<=" if you want an exclusive minimum or maximum:
634
-
635
-
* ``{e<=3}`` permit up to 3 errors
636
-
637
-
* ``{e<4}`` permit fewer than 4 errors
638
-
639
-
* ``{0<e<4}`` permit more than 0 but fewer than 4 errors
575
+
You can also use "<" instead of "<=" if you want an exclusive minimum or maximum.
640
576
641
577
By default, fuzzy matching searches for the first match that meets the given constraints. The ``ENHANCEMATCH`` flag will cause it to attempt to improve the fit (i.e. reduce the number of errors) of the match that it has found.
642
578
643
579
The ``BESTMATCH`` flag will make it search for the best match instead.
644
580
645
581
Further examples to note:
646
582
647
-
* ``regex.search("(dog){e}", "cat and dog")[1]`` returns ``"cat"`` because that matches ``"dog"`` with 3 errors, which is within the limit (an unlimited number of errors is permitted).
583
+
* ``regex.search("(dog){e}", "cat and dog")[1]`` returns ``"cat"`` because that matches ``"dog"`` with 3 errors (an unlimited number of errors is permitted).
648
584
649
-
* ``regex.search("(dog){e<=1}", "cat and dog")[1]`` returns ``" dog"`` (with a leading space) because that matches ``"dog"`` with 1 error, which is within the limit (1 error is permitted).
585
+
* ``regex.search("(dog){e<=1}", "cat and dog")[1]`` returns ``" dog"`` (with a leading space) because that matches ``"dog"`` with 1 error, which is within the limit.
650
586
651
-
* ``regex.search("(?e)(dog){e<=1}", "cat and dog")[1]`` returns ``"dog"`` (without a leading space) because the fuzzy search matches ``" dog"`` with 1 error, which is within the limit (1 error is permitted), and the ``(?e)`` then makes it attempt a better fit.
587
+
* ``regex.search("(?e)(dog){e<=1}", "cat and dog")[1]`` returns ``"dog"`` (without a leading space) because the fuzzy search matches ``" dog"`` with 1 error, which is within the limit, and the ``(?e)`` then it attempts a better fit.
652
588
653
589
In the first two examples there are perfect matches later in the string, but in neither case is it the first possible match.
654
590
@@ -716,7 +652,7 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
716
652
717
653
>>> p = regex.compile(r"first|second|third|fourth|fifth")
718
654
719
-
but if the list is large, parsing the resulting regex can take considerable time, and care must also be taken that the strings are properly escaped if they contain any character that has a special meaning in a regex, and that if there is a shorter string that occurs initially in a longer string that the longer string is listed before the shorter one, for example, "cats" before "cat".
655
+
but if the list is large, parsing the resulting regex can take considerable time, and care must also be taken that the strings are properly escaped and properly ordered, for example, "cats" before "cat".
720
656
721
657
The new alternative is to use a named list:
722
658
@@ -745,7 +681,7 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
745
681
746
682
* Unicode line separators
747
683
748
-
Normally the only line separator is ``\n`` (``\x0A``), but if the ``WORD`` flag is turned on then the line separators are the pair ``\x0D\x0A``, and ``\x0A``, ``\x0B``, ``\x0C`` and ``\x0D``, plus ``\x85``, ``\u2028`` and ``\u2029`` when working with Unicode.
684
+
Normally the only line separator is ``\n`` (``\x0A``), but if the ``WORD`` flag is turned on then the line separators are ``\x0D\x0A``, ``\x0A``, ``\x0B``, ``\x0C`` and ``\x0D``, plus ``\x85``, ``\u2028`` and ``\u2029`` when working with Unicode.
749
685
750
686
This affects the regex dot ``"."``, which, with the ``DOTALL`` flag turned off, matches any character except a line separator. It also affects the line anchors ``^`` and ``$`` (in multiline mode).
751
687
@@ -791,8 +727,6 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
791
727
792
728
.. sourcecode:: python
793
729
794
-
>>> regex.escape("foo!?")
795
-
'foo!\\?'
796
730
>>> regex.escape("foo!?", special_only=False)
797
731
'foo\\!\\?'
798
732
>>> regex.escape("foo!?", special_only=True)
@@ -806,8 +740,6 @@ The issue numbers relate to the Python bug tracker, except where listed as "Hg i
A regex like ``((x|y+)*)*`` will be accepted and will work correctly, but should complete more quickly.
887
-
888
808
* Definition of 'word' character (`issue #1693050 <https://bugs.python.org/issue1693050>`_)
889
809
890
-
The definition of a 'word' character has been expanded for Unicode. It now conforms to the Unicode specification at ``http://www.unicode.org/reports/tr29/``. This applies to ``\w``, ``\W``, ``\b`` and ``\B``.
891
-
892
-
* Groups in lookahead and lookbehind (`issue #814253 <https://bugs.python.org/issue814253>`_)
893
-
894
-
Groups and group references are permitted in both lookahead and lookbehind.
810
+
The definition of a 'word' character has been expanded for Unicode. It now conforms to the Unicode specification at ``http://www.unicode.org/reports/tr29/``.
895
811
896
812
* Variable-length lookbehind
897
813
898
814
A lookbehind can match a variable-length string.
899
815
900
-
* Correct handling of charset with ignore case flag (`issue #3511 <https://bugs.python.org/issue3511>`_)
901
-
902
-
Ranges within charsets are handled correctly when the ignore-case flag is turned on.
903
-
904
-
* Unmatched group in replacement (`issue #1519638 <https://bugs.python.org/issue1519638>`_)
905
-
906
-
An unmatched group is treated as an empty string in a replacement template.
0 commit comments