How Do I Scrape A Website With ChatGPT?
Original Article Published At YourQuorum
In the event that web scratching isn’t your essential area of movement, composing a basic web scratching content can take an annoyingly enormous measure of time. You want to learn (or recollect) how to utilize libraries like Lovely Soup. Furthermore, on the off chance that you don’t utilize a prearranging language like Python or JavaScript routinely, you could have to revive your insight into their linguistic structure.
Yet, imagine a scenario where there were an instrument that could accelerate the improvement cycle, permitting you to zero in on the information as opposed to the code.
In this article, you’ll figure out how to involve ChatGPT for composing web scrubbers. The article will cover the advantages and drawbacks of involving ChatGPT for web scratching and show methods that can be useful for designers hoping to speed up with ChatGPT.
What Is ChatGPT?
ChatGPT is a profoundly progressed chatbot made by OpenAI that utilizations AI to deal with normal language and create human-like reactions.
Because of its extraordinary capacity to comprehend and produce normal language, it can do many undertakings on an almost human-like level: answer questions, create full-length articles, and even compose code.
Right now, it’s in beta testing and accessible free of charge to anyone with any interest.
Cooperations with ChatGPT follow a brief reaction design: you compose a brief containing a message or an errand you maintain that ChatGPT should achieve, and it supplies you with a response. ChatGPT stores the past prompts and their reactions, so you can request that it update or make sense of its responses.
How to Write Web Scraping Code With ChatGPT?
Since ChatGPT can compose code, it can, normally, compose code that networks scratching. To give it a shot, go to the application and sign up/sign in.
It will furnish you with a brief where you can pose inquiries to the chatbot.
ChatGPT is a measurable language model prepared on a lot of text based information from the web and not a characteristic language connection point to a strong no-code instrument. Since ChatGPT can’t get to the web and can’t “see” sites, it’s basically impossible that it could precisely compose code for scratching the things you need from the page.
Thus, in the event that you request that ChatGPT compose a web scrubber without anyone else, there are two potential results. The first is that it has “read” instructional exercises for this specific assignment, in which case it could duplicate the items in those instructional exercises.
The subsequent choice is more awful: it will envision some HTML code that might actually be on the site and compose web scratching code for this code. The subsequent code will look persuading yet will not really work.
# send a GET request to the website and get the HTML content
response = requests.get("http://books.toscrape.com/")
html_content = response.content
# create a BeautifulSoup object with the HTML content
soup = BeautifulSoup(html_content, "html.parser")
# find all the book titles (which are in h3 tags) and prices (which are in p tags with a class of "price_color")
book_titles = soup.find_all("h3")
book_prices = soup.find_all("p", class_="price_color")
# iterate over the book titles and prices and print them out
for i in range(len(book_titles)):
print(book_titles[i].text)
print(book_prices[i].text)
print()
This code utilizes the solicitations library to send a GET solicitation to the site and get the HTML content, and afterward utilizes the BeautifulSoup library to parse the HTML and concentrate the book titles and costs. At last, it emphasizes over the titles and costs and prints them out.
Improving Code Together With ChatGPT
Since ChatGPT recollects the setting of the discussion, it’s not difficult to redesign the code it has composed by giving it extra ideas and solicitations for development.
For instance, web scratching scripts every now and again use intermediaries to conceal the IP address of the web scrubber. This forestalls your IP address from being impeded by site executives that disdain your exercises. There, the IP address of the intermediary will be hindered. Assuming that you utilize a paid intermediary like IPRoyal private intermediaries that pivots the IP on each solicitation, this is no issue.
To add an intermediary to the content, all you want to do is to request that ChatGPT make it happen. Since adding an intermediary isn’t a very setting delicate errand, it ought to do fine and dandy.
import requests
from bs4 import BeautifulSoup
# set up the proxy
proxies = {
"http": "http://your-proxy-address:port",
"https": "http://your-proxy-address:port"
}
# send a GET request to the website and get the HTML content using the proxy
response = requests.get("http://books.toscrape.com/", proxies=proxies)
html_content = response.content
# create a BeautifulSoup object with the HTML content
soup = BeautifulSoup(html_content, "html.parser")
# find all the book titles (which are in h3 tags) and prices (which are in p tags with a class of "price_color")
book_titles = soup.find_all("h3")
book_prices = soup.find_all("p", class_="price_color")
# iterate over the book titles and prices and print them out
for i in range(len(book_titles)):
print(book_titles[i].text)
print(book_prices[i].text)
print()
In this changed rendition of the code, we originally set up a word reference containing the intermediary data (supplant “your-intermediary address” and “port” with the suitable qualities for your intermediary). Then, at that point, when we send the GET demand utilizing the requests.get() capability, we remember the intermediaries contention and pass for the intermediaries word reference. This advises the solicitations library to utilize the intermediary while sending the solicitation.
Tips for Making ChatGPT Coding Prompts
Composing code with ChatGPT could appear to be odd from the outset since it’s for the most part utilized as a computer based intelligence essayist, however you’ll turn out to be increasingly more capable as you acquire insight and realize what it should or shouldn’t do. As a standard, you can view at it as an associate junior engineer that you can utilize each time you wish, and that doesn’t get worn out.
The more unambiguous guidelines you provide for it, the better. In any case, because of the conversational idea of the cooperation, you can continuously begin with a particular brief and afterward work on it with ChatGPT, working on the outcome en route.
While ChatGPT is exceptionally sure, it’s essential to constantly twofold check (or test) its result. It tends to be devastatingly (however goodness so without hesitation) wrong. As a rule, it’s best for basic errands and composing standard.
At long last, recollect that ChatGPT at present can’t get to the web and that its insight comes from measurable surmising, not from having some information encoded inside as code. In this way, it can’t act as a no-code device for fledglings that can’t comprehend and confirm its result.
Conclusion
ChatGPT is a new yet strong innovation that can do a wide range of magnificent things.
In this article, you figured out how you can utilize ChatGPT to compose web scratching scripts. While it can’t be relied upon to compose its own web scrubbers, it can speed up fundamentally. It might be said, it works like an autocomplete on steroids.