使用硒加载网页回传错误-编程知识-白鹭情

我正在尝试创建一个可以抓取某些电子商务网站的应用程序。我为此目的使用 Selenium 并尝试在运行 centos 的 ec2 实体上部署我的应用程序。在部署之前，我在本地开发了我的代码并且它可以作业，但是它在远程机器上给了我错误。

我正在使用的代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

ser = Service(ChromeDriverManager().install())
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")

selenium_driver = webdriver.Chrome(service=ser, options=chrome_options)

url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'

selenium_driver.get(url)

title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)

当我尝试在远程机器上运行此代码时，出现以下堆栈跟踪错误

Traceback (most recent call last):
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2091, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2076, in wsgi_app
    response = self.handle_exception(e)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/ec2-user/price_tracker/flask_api.py", line 22, in home
    title, price, isSizeAvailable, shop = prices.checkInfoByShop(url, size)
  File "/home/ec2-user/price_tracker/check_prices.py", line 132, in checkInfoByShop
    secondaryPriceXPath=secondaryPriceXPath)
  File "/home/ec2-user/price_tracker/check_prices.py", line 61, in checkSelenium
    title = self.selenium_driver.find_element(By.XPATH, titleXPath)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 1246, in find_element
    'value': value})['value']
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span"}
  (Session info: headless chrome=96.0.4664.110)
Stacktrace:
#0 0x559979e8dee3 <unknown>
#1 0x55997995b608 <unknown>
#2 0x559979991aa1 <unknown>
#3 0x559979991c61 <unknown>
#4 0x5599799c4714 <unknown>
#5 0x5599799af29d <unknown>
#6 0x5599799c23bc <unknown>
#7 0x5599799af163 <unknown>
#8 0x559979984bfc <unknown>
#9 0x559979985c05 <unknown>
#10 0x559979ebfbaa <unknown>
#11 0x559979ed5651 <unknown>
#12 0x559979ec0b05 <unknown>
#13 0x559979ed6a68 <unknown>
#14 0x559979eb505f <unknown>
#15 0x559979ef1818 <unknown>
#16 0x559979ef1998 <unknown>
#17 0x559979f0ceed <unknown>
#18 0x7ff5dd53b40b <unknown>

出于除错目的，我尝试使用

body = selenium_driver.find_element(By.XPATH, '/html/body')
print(body.text)

回传

"We're sorry, something has gone wrong. Please try again.\nIf you continue to have trouble, please contact us at support@everlane.com.\nChecking your browser before accessing www.everlane.com.\nThis process is automatic. Your browser will redirect to your requested content shortly.\nPlease allow up to 5 seconds…\nDebugging Information\nIP Address\n<ip-address>\nRay ID\n6c57184d797805a0"

我知道我的请求可能因某种原因被阻止，但有没有办法绕过这个？

我尝试添加等待陈述句以希望登陆重定向，但到目前为止没有任何效果。

uj5u.com热心网友回复：

该讯息看起来页面内容已更改。所以你的代码按预期作业。我会让 Selenium 等待一个元素可见（在这里阅读更多）。如果您不想这样做，您也可以等待页面重定向。如何做到这一点在此处的另一个 SO 问题中得到了解答。

uj5u.com热心网友回复：

因为讯息

Checking your browser before accessing www.everlane.com.\nThis process is automatic. Your browser will redirect to your requested content shortly.

似乎该站点启用了云票价保护。

请参阅参考：https : //thegeekpage.com/how-to-fix-checking-your-browser-before-accessing-message/

我建议尝试 selenium-stealth

https://pypi.org/project/selenium-stealth/

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium_stealth import stealth

ser = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(service=ser, options=options)

stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )

url = 'https://www.everlane.com/products/womens-cloud-cable-knit-vest-oatmeal?collection=womens-newest-arrivals'

driver.get(url)
title = selenium_driver.find_element(By.XPATH, '//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span')
print(title.text)

此外，其中一些存盘库可能会有所帮助：

https://github.com/ultrafunkamsterdam/undetected-chromedriver
https://github.com/VeNoMouS/cloudscraper
https://github.com/unixfox/pupflare

或者看看这个话题：

https://github.com/topics/cloudflare-bypass

uj5u.com热心网友回复：

我建议使用 webdriver 等待页面加载。

wait=WebDriverWait(driver,selenium_driver)                                 
elem=wait.until(EC.visibility_of_element_located((By.XPATH,"//*[@id="content"]/div/div[3]/div[2]/div/div/div/div[2]/div/div[1]/hgroup/h1/span")))
print(elem.text)

进口：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC