Skip to content

Commit 8929cce

Browse files
Url web scraper in python
1 parent 4eb0370 commit 8929cce

File tree

2 files changed

+47
-0
lines changed

2 files changed

+47
-0
lines changed

url-web-scraper/ReadMe.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
## Url Web scraper
2+
3+
Scraping is a very essential skill for everyone to get data from any website.
4+
5+
### Module Needed:
6+
7+
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
8+
- requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
9+
10+
### Code:
11+
12+
```{python}
13+
import requests
14+
from bs4 import BeautifulSoup
15+
16+
17+
url = 'https://www.python.org/'
18+
reqs = requests.get(url)
19+
soup = BeautifulSoup(reqs.text, 'html.parser')
20+
21+
urls = []
22+
for link in soup.find_all('a'):
23+
print(link.get('href'))
24+
```
25+
26+
### Working:
27+
28+
Here we are importing the beautifulsoup from bs4 to convert the document to it’s Unicode, and then further HTML entities are converted to Unicode characters. Then we just iterate through the list of all those links and print one by one. The reqs here is of response type i.e. we are fetching it as a response for the http request of our URL. We are then passing that string as one the parameter to the beautifulsoup and then finally iterating all the links found.
29+
30+
### Screenshot:
31+
32+
![url web scraper output](https://user-images.githubusercontent.com/67074796/193414565-db713a45-ba31-44af-ad47-60383bba7b42.png)

url-web-scraper/url-web-scraper.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import requests
2+
from bs4 import BeautifulSoup
3+
4+
5+
url = 'https://www.python.org/'
6+
# make a request
7+
reqs = requests.get(url)
8+
# download the html content
9+
soup = BeautifulSoup(reqs.text, 'html.parser')
10+
11+
urls = []
12+
# find all anchor tags
13+
for link in soup.find_all('a'):
14+
# extract the value inside href
15+
print(link.get('href'))

0 commit comments

Comments
 (0)