Skip to content

Browser fails to load urls containing certain types of characters (colon, %20, and possible more) #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cztomczak opened this issue Aug 4, 2017 · 5 comments

Comments

@cztomczak
Copy link
Owner

cztomczak commented Aug 4, 2017

Traceback:

  File "cefpython_py27.pyx", line 859, in cefpython_py27.CreateBrowserSync (cefpython_py27.cpp:101641)
  File "utils.pyx", line 100, in cefpython_py27.GetNavigateUrl (cefpython_py27.cpp:7310)
  File "C:\Python27\lib\nturl2path.py", line 60, in pathname2url
    raise IOError, error
IOError: Bad path: http://127.0.0.1:54008/

It seems that is causing the issue:

        # Need to encode chinese characters in local file paths,
        # otherwise CEF will try to encode them by itself. But it
        # will fail in doing so. CEF will return the following string:
        # >> %EF%BF%97%EF%BF%80%EF%BF%83%EF%BF%A6
        # But it should be:
        # >> %E6%A1%8C%E9%9D%A2
        url = urllib_pathname2url(url)

Ref:

url = urllib_pathname2url(url)

That urllib func is imported in cefpython.pyx

if sys.version_info.major == 2:
    # noinspection PyUnresolvedReferences
    from urllib import pathname2url as urllib_pathname2url
else:
    # noinspection PyUnresolvedReferences
    from urllib.request import pathname2url as urllib_pathname2url
@cztomczak
Copy link
Owner Author

That GetNavigateUrl function already has several cases of characters that are decoded back:

        # If it is C:\ then colon was encoded. Decode it back.
        url = re.sub(r"^([a-zA-Z])%3A", r"\1:", url)

        # Allow hash when loading urls. The pathname2url function
        # replaced hashes with "%23" (Issue #114).
        url = url.replace("%23", "#")

        # Allow more special characters when loading urls. The pathname2url
        # function encoded them and need to decode them back here
        # Characters: ? & = (Issue #273).
        url = url.replace("%3F", "?")
        url = url.replace("%26", "&")
        url = url.replace("%3D", "=")

It seems that this list grows by time, : needs to be supported as well, and there was just reported an another issue on the Forum for an url like file:///E:/PySideFactory/web%20-%20Kopie/html/main.html.

The call to GetNavigateUrl was created to resolve the issue with chinese characters, but maybe that should be resolved by the user by calling pathname2url in his app code and passing a fixed url to CreateBrowserSync, this way all these issues we have would be resolved. There should be information in documentation for CreateBrowserSync that if url contains chinese characters it should be encoded. This would require further testing.

@cztomczak cztomczak changed the title Browser fails to load url like http://127.0.0.1:54008/ Browser fails to load for urls with certain types of characters (colon, %20, and possible more) Oct 8, 2017
@cztomczak cztomczak changed the title Browser fails to load for urls with certain types of characters (colon, %20, and possible more) Browser fails to load urls containing certain types of characters (colon, %20, and possible more) Oct 8, 2017
@Berserker66
Copy link

Berserker66 commented Oct 8, 2017

I'd say to recommend the use of pathlib.path.as_uri() in the createbrowser documentation and do the conversion in app-code. There are plenty of uri shemes and too many wrong assumptions that could be made.

https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.as_uri

Problem is of course, that pathlib is a python 3.4 module. earlier pythons would need to use pathname2url from urllib.

@cztomczak
Copy link
Owner Author

I'm leaning to making this work with unicode characters out of the box, otherwise there will be issues with apps that are installed to %appdata% directory on Windows, since user name may contain unicode characters. Maybe write a function that will replace any %xx characters before passing path to urllib.pathname2url (or the Py3 func you mentioned) and then restore these escaped characters back.

Maybe modifying the path is not even required in latest CEF, I haven't tested how it behaves with unicode characters (chinese ones especially that were reported initially), there have been many changes in upstream since v31 and that GetNavigateUrl func was written long before v31.

@cztomczak
Copy link
Owner Author

Related upstream CEF issue: Issue #2407 ("Cef Unable to open file with # symbol in file path name").

cztomczak added a commit that referenced this issue Jan 20, 2020
Updated Migration-Guide.md document and other documentation.
@cztomczak
Copy link
Owner Author

Fixed in commit 6cc7f9c. This breaks backwards compatibility and thus I've added required information to the Migration Guide document.

@Berserker66 Thanks for the info. I've added info on both pathlib.PurePath.as_uri and urllib.pathname2url in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants