-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
RFC: Roadmap for improving string support in Julia #11558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For what its worth, some thoughts from a similar set of questions about UTF8 handling from another project.
a) errors give the code a chance to handle the problem, but checking them can be forgotten. b) exceptions require all operations checking encodings to be wrapped in c) substitutions may give valid encoding, but the resulting string can then go on to cause problems elsewhere because of the substitutions Net result, no "one size fits all" answer does, the library has to allow the programmer to make the choice how they want to handle it. Which makes the library more complex and makes even more of an argument to check only in those functions that operate at the boundaries and not on every internal operation. |
faster julia str search using stsrt is about 10 times faster depending on the input on my Arch Linux. https://github.com/peter1000/faster_julia_str_search |
@tkelman What does the "julep" label mean? @elextr To me, the @peter1000 I was just pointed at the whole mess of search/rsearch/rsearch/searchindex/rsearchindex this weekend by @tkelman (I will have to get back at him somehow, maybe at JuliaCon 😀) Thanks everybody for joining in the (hopefully productive) discussion! |
julep = "julia enhancement proposal" |
Roughly an extremely informal version of a Python PEP, or Rust RFC. Eventual level of formality TBD. |
I had a look at this stringsearch But on all tests: the Arch linux default |
@peter1000 Thanks for the pointer to @gquere's repository... that precisely the sort of thing I'd like to see, implemented all in Julia, well documented, to show the various tradeoffs of the different algorithms. @pao & @tkelman Thanks for the info... I just kept think about having a mint Julep... I know about PEPs, should have figured it out from that! |
@ScottPJones another place where conversions clearly can be unchecked is output since the internal string is correct, and writing to the database, but unless this is the only program writing to it then I wouldn't trust it for input. @peter1000 |
@elextr Since I will be writing both input and output to the database, I can definitely be sure of the input... (the database access is totally controlled...) |
On 4 June 2015 at 10:20, Scott P. Jones [email protected] wrote:
Sure you can in a specific application, but a general Julia library can't
|
@elextr Isn't that precisely where an |
This feels subsumed by #16107 |
I would like comments on the things that I would like to see happen for string support in Julia,
such as, constructive criticism or advice on how to achieve the goals, other issues that I hadn't noticed yet, relative priority of goals,...
Fix issues I have found related to mapping, upper/lower case and type stability.
uppercase/lowercase on a UTF16String returns a UTF8String #11460 Bugs in Unicode handling with UTF8String #11463 map on an AbstractString, if no more specific map method found, returns UTF8String always #11464 convert function for UTF-16 from an AbstractArray{UInt8} problems #11501 Problems with convert from AbstractArray{UInt8} to UTF32String #11502
Make julia no longer dependent on
utf8proc
. See RFC: rewrite functions used from utf8proc.c in Julia #11315, and also see 1.3-dev1 fails to build on 64-bit JuliaStrings/utf8proc#42, and the comments in uppercase/lowercase functions are not portable? #11471 to see what a mess this causes dependence, even for one of the top contributors to Julia.This can have a number of benefits, not limited to improving performance, including being able to rebuild and distribute updated Unicode data files, without having to get a new version of Julia
Improve string conversion performance (just needs Fix #10959 bugs with UTF-16 conversions #11551 merged in)
Test (as a package at first) having always validated String and Char types, to see how that could affect performance (and reliability of code).
... I feel there's a lot more ... feel free to add to the list!
The text was updated successfully, but these errors were encountered: