In this article we will explain how to rotate proxies for web scraping. Rotating proxies will ensure stable sessions so you can reach your desired targets without issues.
How to start with rotating proxies in Python - installing prerequisites
To get started, you can create a virtual environment by running the following command:
This command will help you to install Python, pip, and common libraries in your venv folder.
Use the source command to activate your environment:
Install requests module in the current virtual environment you are using:
Congratulations! You have finished all the steps for the installation of the requests module!
Sending GET requests through a proxy
Now, let’s start with the basics. In some cases you might need to connect and use one single IP address or proxy. How do we use a single proxy?These are the essential things that you will need:
- Scheme (e.g., http);
- Endpoint;
- Port (e.g., 10000);
- Username and password to connect to the proxy.
Here is an example how the proxy request should look in this case:
https://Customer-username:password@proxy.goproxies.com:10001
You can also select multiple protocols, as well as specify domains where you would like to use a separate proxy.
Replace PROXY1, PROXY2, PROXY3 with your proxy format as shown in the example below:
proxies = {
'http': 'PROXY1',
'https': 'PROXY2'
}
Make a request using requests.get while providing the variables we created previously:
try:
response = requests.get('https://ip.goproxies.com', proxies=proxies, timeout=10)
print(response.text)
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
Your full command should look like this:
import requests
proxies = {
'http': 'http://customer-username:password@proxy.goproxies.com:10000',
'https': 'https://customer-username:password@proxy.goproxies.com:10001'
}
try:
response = requests.get('https://ip.goproxies.com', proxies=proxies, timeout=10)
print(response.text)
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
The result of this script will provide you with the IP address of your proxy:
% python proxy.py
45.42.JKL.MNO
You have now taken care of hiding behind a proxy when making requests through the Python script.
Let's learn how to rotate through a list of proxies instead of just using one.
Rotating proxies using proxy pool
You will work with a list of proxy servers saved as a CSV file called proxies.csv, in which you need to list proxy servers as shown below:
http://customer-username:password@proxy.goproxies.com:10000
https://customer-username:password@proxy.goproxies.com:10001
http://customer-username:password@proxy-america.goproxies.com:10000
http://customer-username:password@proxy-asia.goproxies.com:10000
http://customer-username:password@proxy-europe.goproxies.com:10000
If you want to add more proxies in the file, add each of them in a separate line.
After that, create a Python file and specify the file name and the timeout duration for each single proxy response.
TIMEOUT_IN_SECONDS = 10
CSV_FILENAME = 'proxies.csv'
Using the code provided, open the CSV file, read each line of proxy servers into the csv_row variable, and build the scheme_proxy_map configuration.
This is an example of how it should look:
with open(CSV_FILENAME) as open_file:
reader = csv.reader(open_file)
for csv_row in reader:
scheme_proxy_map = {
'https': csv_row[0],
}
To ensure that everything runs efficiently, we'll use the same scraping code as before, to access the site with proxies.
with open(CSV_FILENAME) as open_file:
reader = csv.reader(open_file)
for csv_row in reader:
proxies = {
'https': csv_row[0],
}
try:
response = requests.get('https://ip.goproxies.com', proxies=proxies, timeout=TIMEOUT_IN_SECONDS)
print(response.text)
except requests.exceptions.RequestException as e:
print(f"An error occurred with proxy {csv_row[0]}: {e}")
If you want to scrape content using any working proxy from the list and stop the script after a successful attempt, just add a break after print line to stop going through the proxies in the CSV file:
response = requests.get('https://ip.goproxies.com',
proxies=proxies, timeout=TIMEOUT_IN_SECONDS)
print(response.text)
break # break here to stop going through the proxies
Your full code should look like this:
import requests
import csv
TIMEOUT_IN_SECONDS = 10
CSV_FILENAME = 'proxies.csv'
with open(CSV_FILENAME) as open_file:
reader = csv.reader(open_file)
for csv_row in reader:
proxies = {
'https': csv_row[0],
}
try:
response = requests.get('https://ip.goproxies.com', proxies=proxies, timeout=TIMEOUT_IN_SECONDS)
print(response.text)
break # Break the loop after a successful request
except requests.exceptions.RequestException as e:
print(f"An error occurred with proxy {csv_row[0]}: {e}")
That's it! Congratulations, you have successfully learned how to rotate proxies using Python.