How To Bypass Bot Detection And Scrape A Website Using Python?

Richelle John
2 min readJul 24, 2023

--

Detection And Scrape

Original Article Published At YourQuorum

Today we will look a great python module, rejecting is fun assuming that you attempt it already. Rejecting and Creeping are normal names, yet they have a touch of contrast. Web Slithering essentially Google and Facebook and so on do, it’s searching for any data. Then again, Rejecting is focused on at specific sites, for explicit information, for example for item data and cost, and so on.

Check Development Environment Ready or not

Prior to pushing ahead we want to check python is accessible or not. To do as such, Open terminal or order line and hit underneath order,

python --version
Output: Python 2.7.16

Or,

python3 --version
Output: Python 3.8.0

Assuming all that appears as though me, your python adaptation may be not the same as me. So don’t stress over it.

Setup Virtual Environment

We want to establish Virtual Climate, to stay away from python modules, reliance or libraries adaptation clashing issues. So we can guarantee disengagement, each venture conditions or libraries variant can be kept up with without any problem.

Open terminal or order line then make a task

macOS Users

pip install virtualenv
python3 -m virtualenv venv
source venv/bin/activate

Windows Users:-

pip install virtualenv
virtualenv venv
srouce venv\Scripts\activate

We can see, venv envelope will be made. Compliment effectively we can establish Virtual Climate

Introduce required libs or modules

Open Terminal or order line then, at that point, hit howl orders,

pip install beautifulsoup4
pip install cfscrape

Learn Fundamental How Rejecting Work

Create app.py file, includes

import cfscrape
from bs4 import BeautifulSoup

def basic():
# string html code sample
html_text = '''
<div>
<h1 class="product-name">Product Name 1</h1>
<h1 custom-attr="price" class="product-price">100</h1>
<p class="product description">This is basic description 1</p>
</div>
<div>
<h1 class="product-name">Product Name 2</h1>
<h1 custom-attr="price" class="product-price">200</h1>
<p class="product description">This is basic description 2</p>
</div>
'''
parsed_html = BeautifulSoup(html_text, 'html.parser') # String to HTML
# parsed_html = BeautifulSoup("https://www.google.com", 'html.parser') # URL to HTML
# parsed_html = BeautifulSoup(open("from_file.html"), 'html.parser') # File to HTML

print(parsed_html.select(".product-name")[0].text)
print(parsed_html.select(".product-name")[1].text)
print(parsed_html.select(".product.description")[0].text)
print(parsed_html.findAll("h1", {"custom-attr": "price"})[0].text)
print(parsed_html.find("h1", {"custom-attr": "price"}).text)

if __name__ == '__main__':
basic()

Remember the Anti Bot Scraping

Create app.py file, includes

def anti_bot_scraping():

target_url = "https://www.google.com" # replace url with anti-bot protected website
scraper = cfscrape.create_scraper()
html_text = scraper.get(target_url).text
parsed_html = BeautifulSoup(html_text, 'html.parser')
print(parsed_html)

if __name__ == '__main__':
anti_bot_scraping()

Presently, open a terminal and hit beneath order, python app.py to run that document.

--

--

Richelle John
Richelle John

Written by Richelle John

With over five years' experience in leading marketing initiatives across Europe and the US, I am a digital marketing expert. Visit Here https://bit.ly/3Wsauvr

No responses yet