Skip to content

Commit 05f3948

Browse files
authored
Merge pull request #27 from ScrapingAnt/feature/add-async-client-support
feature/add-async-client-support: done
2 parents 10694a2 + c8644f7 commit 05f3948

File tree

7 files changed

+258
-41
lines changed

7 files changed

+258
-41
lines changed

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
init:
2-
pip3 install -e .[dev]
2+
pip3 install -e .[dev,async]
33

44
test:
55
pytest -p no:cacheprovider

README.md

+53-14
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
# ScrapingAnt API client for Python
2+
23
[![PyPI version](https://badge.fury.io/py/scrapingant-client.svg)](https://badge.fury.io/py/scrapingant-client)
34

4-
`scrapingant-client` is the official library to access [ScrapingAnt API](https://docs.scrapingant.com) from your
5-
Python applications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience.
6-
Requires python 3.6+.
5+
`scrapingant-client` is the official library to access [ScrapingAnt API](https://docs.scrapingant.com) from your Python
6+
applications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires
7+
python 3.6+.
78

89
<!-- toc -->
910

@@ -17,6 +18,7 @@ Requires python 3.6+.
1718
<!-- tocstop -->
1819

1920
## Quick Start
21+
2022
```python3
2123
from scrapingant_client import ScrapingAntClient
2224

@@ -26,23 +28,37 @@ result = client.general_request('https://example.com')
2628
print(result.content)
2729
```
2830

31+
## Install
32+
33+
```shell
34+
pip install scrapingant-client
35+
```
36+
37+
If you need async support:
38+
39+
```shell
40+
pip install scrapingant-client[async]
41+
```
42+
2943
## API token
44+
3045
In order to get API token you'll need to register at [ScrapingAnt Service](https://app.scrapingant.com)
3146

3247
## API Reference
48+
3349
All public classes, methods and their parameters can be inspected in this API reference.
3450

3551
#### ScrapingAntClient(token)
3652

37-
Main class of this library.
53+
Main class of this library.
3854

3955
| Param | Type |
4056
| --- | --- |
4157
| token | <code>string</code> |
4258

4359
* * *
4460

45-
#### ScrapingAntClient.general_request
61+
#### ScrapingAntClient.general_request and ScrapingAntClient.general_request_async
4662

4763
https://docs.scrapingant.com/request-response-format#available-parameters
4864

@@ -63,6 +79,7 @@ https://docs.scrapingant.com/request-response-format#available-parameters
6379
* * *
6480

6581
#### Cookie
82+
6683
Class defining cookie. Currently it supports only name and value
6784

6885
| Param | Type |
@@ -73,7 +90,8 @@ Class defining cookie. Currently it supports only name and value
7390
* * *
7491

7592
#### Response
76-
Class defining response from API.
93+
94+
Class defining response from API.
7795

7896
| Param | Type |
7997
| --- | --- |
@@ -83,11 +101,11 @@ Class defining response from API.
83101

84102
## Exceptions
85103

86-
`ScrapingantClientException` is base Exception class, used for all errors.
104+
`ScrapingantClientException` is base Exception class, used for all errors.
87105

88106
| Exception | Reason |
89107
| --- | --- |
90-
| ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit
108+
| ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit
91109
| ScrapingantInvalidInputException | Invalid value provided. Please, look into error message for more info |
92110
| ScrapingantInternalException | Something went wrong with the server side code. Try again later or contact ScrapingAnt support |
93111
| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally |
@@ -106,7 +124,7 @@ from scrapingant_client import Cookie
106124
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
107125

108126
result = client.general_request(
109-
'https://httpbin.org/cookies',
127+
'https://httpbin.org/cookies',
110128
cookies=[
111129
Cookie(name='cookieName1', value='cookieVal1'),
112130
Cookie(name='cookieName2', value='cookieVal2'),
@@ -122,6 +140,7 @@ response_cookies = result.cookies
122140

123141
```python
124142
from scrapingant_client import ScrapingAntClient
143+
125144
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
126145

127146
customJsSnippet = """
@@ -130,7 +149,7 @@ var htmlElement = document.getElementsByTagName('html')[0];
130149
htmlElement.innerHTML = str;
131150
"""
132151
result = client.general_request(
133-
'https://example.com',
152+
'https://example.com',
134153
js_snippet=customJsSnippet,
135154
)
136155
print(result.content)
@@ -145,14 +164,16 @@ client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
145164

146165
RETRIES_COUNT = 3
147166

167+
148168
def parse_html(html: str):
149169
... # Implement your data extraction here
150170

171+
151172
parsed_data = None
152173
for retry_number in range(RETRIES_COUNT):
153174
try:
154175
scrapingant_response = client.general_request(
155-
'https://example.com',
176+
'https://example.com',
156177
)
157178
except ScrapingantInvalidInputException as e:
158179
print(f'Got invalid input exception: {{repr(e)}}')
@@ -167,7 +188,6 @@ for retry_number in range(RETRIES_COUNT):
167188
break # Data is parsed successfully, so we dont need to retry
168189
except Exception as e:
169190
print(f'Got exception while parsing data {repr(e)}')
170-
171191

172192
if parsed_data is None:
173193
print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')
@@ -184,7 +204,7 @@ from scrapingant_client import ScrapingAntClient
184204
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
185205

186206
result = client.general_request(
187-
'https://httpbin.org/headers',
207+
'https://httpbin.org/headers',
188208
headers={
189209
'test-header': 'test-value'
190210
}
@@ -193,13 +213,32 @@ print(result.content)
193213

194214
# Http basic auth example
195215
result = client.general_request(
196-
'https://jigsaw.w3.org/HTTP/Basic/',
216+
'https://jigsaw.w3.org/HTTP/Basic/',
197217
headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}
198218
)
199219
print(result.content)
200220
```
201221

222+
### Simple async example
223+
224+
```python3
225+
import asyncio
226+
227+
from scrapingant_client import ScrapingAntClient
228+
229+
client = ScrapingAntClient(token='<YOUR-SCRAPINGANT-API-TOKEN>')
230+
231+
232+
async def main():
233+
# Scrape the example.com site.
234+
result = await client.general_request_async('https://example.com')
235+
print(result.content)
236+
237+
238+
asyncio.run(main())
239+
```
202240

203241
## Useful links
242+
204243
- [Scrapingant API doumentation](https://docs.scrapingant.com)
205244
- [Scrapingant JS Client](https://github.com/scrapingant/scrapingant-client-js)

scrapingant_client/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "0.3.9"
1+
__version__ = "1.0.0"
22

33
from scrapingant_client.client import ScrapingAntClient
44
from scrapingant_client.cookie import Cookie

scrapingant_client/client.py

+87-20
Original file line numberDiff line numberDiff line change
@@ -25,24 +25,23 @@ def __init__(self, token: str):
2525
self.token = token
2626
self.requests_session = requests.Session()
2727
version = scrapingant_client.__version__
28-
user_agent = f'ScrapingAnt Client/{version} ({sys.platform}; Python/{platform.python_version()});'
28+
self.user_agent = f'ScrapingAnt Client/{version} ({sys.platform}; Python/{platform.python_version()});'
2929
self.requests_session.headers.update({
3030
'x-api-key': self.token,
31-
'User-Agent': user_agent,
31+
'User-Agent': self.user_agent,
3232
})
3333

34-
def general_request(
34+
def _form_payload(
3535
self,
3636
url: str,
3737
cookies: Optional[List[Cookie]] = None,
38-
headers: Optional[Dict[str, str]] = None,
3938
js_snippet: Optional[str] = None,
4039
proxy_type: ProxyType = ProxyType.datacenter,
4140
proxy_country: Optional[str] = None,
4241
return_text: bool = False,
4342
wait_for_selector: Optional[str] = None,
4443
browser: bool = True,
45-
) -> Response:
44+
) -> Dict:
4645
request_data = {'url': url}
4746
if cookies is not None:
4847
request_data['cookies'] = cookies_list_to_string(cookies)
@@ -56,29 +55,97 @@ def general_request(
5655
request_data['wait_for_selector'] = wait_for_selector
5756
request_data['return_text'] = return_text
5857
request_data['browser'] = browser
58+
return request_data
5959

60-
response = self.requests_session.post(
61-
SCRAPINGANT_API_BASE_URL + '/general',
62-
json=request_data,
63-
headers=convert_headers(headers),
64-
)
65-
if response.status_code == 403:
60+
def _parse_response(self, response_status_code: int, response_data: Dict, url: str) -> Response:
61+
if response_status_code == 403:
6662
raise ScrapingantInvalidTokenException()
67-
elif response.status_code == 404:
63+
elif response_status_code == 404:
6864
raise ScrapingantSiteNotReachableException(url)
69-
elif response.status_code == 422:
70-
raise ScrapingantInvalidInputException(response.text)
71-
elif response.status_code == 423:
65+
elif response_status_code == 422:
66+
raise ScrapingantInvalidInputException(response_data)
67+
elif response_status_code == 423:
7268
raise ScrapingantDetectedException()
73-
elif response.status_code == 500:
69+
elif response_status_code == 500:
7470
raise ScrapingantInternalException()
75-
json_response = response.json()
76-
content = json_response['content']
77-
cookies_string = json_response['cookies']
78-
status_code = json_response['status_code']
71+
content = response_data['content']
72+
cookies_string = response_data['cookies']
73+
status_code = response_data['status_code']
7974
cookies_list = cookies_list_from_string(cookies_string)
8075
return Response(
8176
content=content,
8277
cookies=cookies_list,
8378
status_code=status_code
8479
)
80+
81+
def general_request(
82+
self,
83+
url: str,
84+
cookies: Optional[List[Cookie]] = None,
85+
headers: Optional[Dict[str, str]] = None,
86+
js_snippet: Optional[str] = None,
87+
proxy_type: ProxyType = ProxyType.datacenter,
88+
proxy_country: Optional[str] = None,
89+
return_text: bool = False,
90+
wait_for_selector: Optional[str] = None,
91+
browser: bool = True,
92+
) -> Response:
93+
request_data = self._form_payload(
94+
url=url,
95+
cookies=cookies,
96+
js_snippet=js_snippet,
97+
proxy_type=proxy_type,
98+
proxy_country=proxy_country,
99+
return_text=return_text,
100+
wait_for_selector=wait_for_selector,
101+
browser=browser,
102+
)
103+
response = self.requests_session.post(
104+
SCRAPINGANT_API_BASE_URL + '/general',
105+
json=request_data,
106+
headers=convert_headers(headers),
107+
)
108+
response_status_code = response.status_code
109+
response_data = response.json()
110+
parsed_response: Response = self._parse_response(response_status_code, response_data, url)
111+
return parsed_response
112+
113+
async def general_request_async(
114+
self,
115+
url: str,
116+
cookies: Optional[List[Cookie]] = None,
117+
headers: Optional[Dict[str, str]] = None,
118+
js_snippet: Optional[str] = None,
119+
proxy_type: ProxyType = ProxyType.datacenter,
120+
proxy_country: Optional[str] = None,
121+
return_text: bool = False,
122+
wait_for_selector: Optional[str] = None,
123+
browser: bool = True,
124+
) -> Response:
125+
import httpx
126+
127+
request_data = self._form_payload(
128+
url=url,
129+
cookies=cookies,
130+
js_snippet=js_snippet,
131+
proxy_type=proxy_type,
132+
proxy_country=proxy_country,
133+
return_text=return_text,
134+
wait_for_selector=wait_for_selector,
135+
browser=browser,
136+
)
137+
async with httpx.AsyncClient(
138+
headers={
139+
'x-api-key': self.token,
140+
'User-Agent': self.user_agent,
141+
}
142+
) as client:
143+
response = await client.post(
144+
SCRAPINGANT_API_BASE_URL + '/general',
145+
json=request_data,
146+
headers=convert_headers(headers),
147+
)
148+
response_status_code = response.status_code
149+
response_data = response.json()
150+
parsed_response: Response = self._parse_response(response_status_code, response_data, url)
151+
return parsed_response

setup.py

+9-4
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,14 @@
3838
install_requires=['requests>=2,<3'],
3939
extras_require={
4040
'dev': [
41-
'pytest>=6,<7',
42-
'flake8>=3,<4',
43-
'responses>=0,<1'
44-
]
41+
'pytest>=7,<8',
42+
'flake8>=4,<5',
43+
'responses>=0,<1',
44+
'pytest-httpx>=0,<1',
45+
'pytest-asyncio>=0,<1',
46+
],
47+
'async': [
48+
'httpx<1',
49+
],
4550
},
4651
)

tests/test_exceptions.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ def test_invalid_input():
2828
client = ScrapingAntClient(token='some_token')
2929
with pytest.raises(ScrapingantInvalidInputException) as e:
3030
client.general_request('bad_url')
31-
assert '{"detail": "wrong url"}' in str(e)
31+
assert 'wrong url' in str(e)
3232

3333

3434
@responses.activate

0 commit comments

Comments
 (0)