Skip to content

Proxy server over https doesn't work properly #10676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
alex-ber opened this issue Apr 1, 2025 · 3 comments
Open
1 task done

Proxy server over https doesn't work properly #10676

alex-ber opened this issue Apr 1, 2025 · 3 comments
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided

Comments

@alex-ber
Copy link

alex-ber commented Apr 1, 2025

Describe the bug

I'm using Fiddler as local Proxy Server

This code is in some 3rd-party library:

TAVILY_API_URL = "https://api.tavily.com"

async with aiohttp.ClientSession() as session:
    async with session.post(f"{TAVILY_API_URL}/search", json=params) as res:
        if res.status == 200:
            data = await res.text()
            return data
        else:
            raise Exception(f"Error {res.status}: {res.reason}")

.env

http_proxy=http://172.30.176.1:8888
https_proxy=http://172.30.176.1:8888
all_proxy=http://172.30.176.1:8888
SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

I'm using FastAPI on Python 3.11 that by default installs uvloop (not asyncio's event-loop). uvloop's underlying_transport doesn't have_start_tls_compatible attribute, but it does have start_tls.

In order to make this code work with proxy I have to do monkey-patch:

import aiohttp

# monkey-patch
# Save the original ClientSession class
OriginalClientSession = aiohttp.ClientSession


# Define a patched ClientSession that forces the proxy
class PatchedClientSession(OriginalClientSession):
    def __init__(self, *args, **kwargs):
        # Always set trust_env=True to honor proxy environment variables
        kwargs['trust_env'] = True
        super().__init__(*args, **kwargs)


# Apply the patch
aiohttp.ClientSession = PatchedClientSession

Again, I can't change the code above, it is not under my control. Why trust_env has False as default value in the first place? httpx, requests has True by default..._I have created seperated bug #10682 for it.

The problematic code is in aiohttp.connector.TCPConnector class mostly in: _create_proxy_connection() functions.

In my case req.is_ssl() is True (and runtime_has_start_tls is also True), so:

  1. self._warn_about_tls_in_tls(transport, req) is called and misleading warning is printed.

Work-arround:

warnings.filterwarnings(
        'ignore',
        category=RuntimeWarning,
        message=r".*HTTPS request.*through an HTTPS proxy.*TLS\s+in\s+TLS.*"
    )

I've opened #10683 as separated bug,

  1. Same _create_proxy_connection() functions:
proxy_resp = await proxy_req.send(conn) #proxy_resp is instance of `ClientResponse`
...
resp = await proxy_resp.start(conn)

Between these lines, if you actually put break-point inside ClientResponse.__init__() after self._cache: Dict[str, Any] = {} line, you will see, surprise, surprise, that self._cache actually has some key-value in it. In particular, it has 'headers':None. This will lead to the following unpredictable behavior in side start().

...
self._headers = message.headers #assignment succeeds
...
if cookie_hdrs := self.headers.getall(hdrs.SET_COOKIE, ()):  #fails, because it ignores `self._headers` 
     #and used cached value that is None, so `None.getall()`` call raise Exceptions.

headers uses your's custom method descriptor that has some subtle bug, that I wasn't able to identify. So, I'm disabling cache-behavior in the descriptor alltogether.

Work-arround:

# monkey-patch
import propcache
import propcache.api as _propcache_api

propcache.under_cached_property = property
propcache_api.under_cached_property = property

Python Version

$ python --version
Python 3.11.11

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.11.9
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author:
Author-email:
License: Apache-2.0
Location: /usr/local/lib/python3.11/site-packages
Requires: aiohappyeyeballs, aiosignal, attrs, frozenlist, multidict, propcache, yarl
Required-by: aiobotocore, langchain, langchain-community

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.1.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /usr/local/lib/python3.11/site-packages
Requires:
Required-by: aiohttp, yarl

propcache Version

$ python -m pip show propcache
Name: propcache
Version: 0.2.1
Summary: Accelerated property cache
Home-page: https://github.com/aio-libs/propcache
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache-2.0
Location: /usr/local/lib/python3.11/site-packages
Requires:
Required-by: aiohttp, yarl

yarl Version

$ python -m pip show yarl
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache-2.0
Location: /usr/local/lib/python3.11/site-packages
Requires: idna, multidict, propcache
Required-by: aiohttp

OS

python:3.11-slim

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@alex-ber alex-ber added the bug label Apr 1, 2025
@bdraco
Copy link
Member

bdraco commented Apr 1, 2025

Can you try patching the message.headers call to raise when its set so you can set where its being set to None?

@alex-ber
Copy link
Author

alex-ber commented Apr 1, 2025

message.headers are fine. Moreover self._headers are updated.

BUT headers is under_cached_property aka reify.

So, while _headers=message.headers, headers return None, because for some wired reason _cache as {"headers:None} (it has also URL; URL also uses reify, so maybe their share the same cache...).
Moreover, writing monkey-patch to manually remove headers key from _cache failed. I have no idea why.

@bdraco
Copy link
Member

bdraco commented Apr 1, 2025

Its likely headers is being accessed by something sooner than expected so it gets cached to None. We need to figure out where thats happening to prevent it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided
Projects
None yet
Development

No branches or pull requests

3 participants