Bug Description
The library gets stuck in an infinite retry loop when a website has DNS errors or resolution failures.
Symptoms
- Application appears frozen
- Logs show repeated "Network error occurred" messages with 20-second intervals
- Never fails after a set number of retries
Impact:
- Scrapers can hang indefinitely on invalid domains.
- Batch jobs never complete.
- Users have no way to recover except manually terminating the process.
- DNS failures are typically permanent and unlikely to succeed on retry.
Root Cause
In botasaurus_requests/reqs.py, the retry_on_network_error() function has a while True loop that retries forever on DNS lookup failures like dial tcp: lookup ... no such host, with no retry limit.
Affected Code
botasaurus_requests/reqs.py - retry_on_network_error() function
- All HTTP methods: GET, POST, HEAD, PUT, PATCH, DELETE, OPTIONS
Code to reproduce error
from botasaurus.request import request
@request(output=None)
def get_soup_from_botasaurus(reqs,Url):
try:
response = reqs.get(Url)
response.raise_for_status()
res = response.text
return res
except Exception as e:
print(f"Error fetching {Url}: {e}")
return None
url = "https://examaaple.com/"
text = get_soup_from_botasaurus(url)
print(text,"<<<<<<<")
Possible Solutions:
Option 1:
Add a maximum retry limit (e.g. 3 retries) before raising the exception.
Option 2:
Do not retry DNS resolution failures at all, since "no such host" errors are generally permanent and not transient network failures.
what i fixed
< --- before fix ---->
def retry_on_network_error(func: Callable):
while True:
try:
return func()
except Exception as e:
if 'dial tcp: lookup' in str(e) and 'no such host' in str(e):
print(f"Network error occurred: {e}. Retrying in 20 seconds...")
time.sleep(20)
print("Retrying now...")
else:
raise e
< --- after fix ---->
def retry_on_network_error(func: Callable, retries: int = 3, retry_wait: int = 20):
attempt = 0
while True:
try:
return func()
except Exception as e:
message = str(e).lower()
if 'dial tcp: lookup' in message and 'no such host' in message:
attempt += 1
if attempt > retries:
raise
print(f"Network error occurred: {e}. Retrying in {retry_wait} seconds... ({attempt}/{retries})")
time.sleep(retry_wait)
print("Retrying now...")
else:
raise e
Bug Description
The library gets stuck in an infinite retry loop when a website has DNS errors or resolution failures.
Symptoms
Impact:
Root Cause
In
botasaurus_requests/reqs.py, theretry_on_network_error()function has awhile Trueloop that retries forever on DNS lookup failures likedial tcp: lookup ... no such host, with no retry limit.Affected Code
botasaurus_requests/reqs.py-retry_on_network_error()functionCode to reproduce error
from botasaurus.request import request
@request(output=None)
def get_soup_from_botasaurus(reqs,Url):
try:
response = reqs.get(Url)
response.raise_for_status()
res = response.text
return res
except Exception as e:
print(f"Error fetching {Url}: {e}")
return None
url = "https://examaaple.com/"
text = get_soup_from_botasaurus(url)
print(text,"<<<<<<<")
Possible Solutions:
Option 1:
Add a maximum retry limit (e.g. 3 retries) before raising the exception.
Option 2:
Do not retry DNS resolution failures at all, since "no such host" errors are generally permanent and not transient network failures.
what i fixed
< --- before fix ---->
def retry_on_network_error(func: Callable):
while True:
try:
return func()
except Exception as e:
if 'dial tcp: lookup' in str(e) and 'no such host' in str(e):
print(f"Network error occurred: {e}. Retrying in 20 seconds...")
time.sleep(20)
print("Retrying now...")
else:
raise e
< --- after fix ---->
def retry_on_network_error(func: Callable, retries: int = 3, retry_wait: int = 20):
attempt = 0
while True:
try:
return func()
except Exception as e:
message = str(e).lower()
if 'dial tcp: lookup' in message and 'no such host' in message:
attempt += 1
if attempt > retries:
raise
print(f"Network error occurred: {e}. Retrying in {retry_wait} seconds... ({attempt}/{retries})")
time.sleep(retry_wait)
print("Retrying now...")
else:
raise e