Selenium(python)如何最好地处理页面例外-编程知识-白鹭情

我正在抓取发布新法律 (Gazzetta Ufficiale) 的意大利网站的页面，以保存包含法律文本的最后一页。

我有一个回圈，它构建了一个要下载的页面串列，并附上了一个完整作业的 cose 示例，该示例显示了我正在运行的问题（示例没有回圈，我只是在执行两次“获取”。

处理不显示“Visualizza”（显示）按钮但直接显示所需全文的稀有页面的最佳方法是什么？

希望代码是非常自我解释和评论的。在此先感谢您，2022 年超级快乐！

import time
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome("/Users/bob/Documents/work/scraper/scrape_gu/chromedriver")

# showing the "normal" behaviour
driver.get(
    "https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07300&tipoSerie=serie_generale&tipoVigenza=originario"
)
# this page has a "Visualizza" button, find it and click it.
bottoni = WebDriverWait(driver, 10).until(
    EC.visibility_of_all_elements_located(
        (By.XPATH, '//*[@id="corpo_export"]/div/input[1]')
    )
)
time.sleep(5)  # just to see the "normal" result with the "Visualizza" button
bottoni[0].click()  # now click it  and this shows the desired final webpage
time.sleep(5)  # just to see the "normal" desired result

# but unfortunately some pages directly get to the end result WITHOUT the "Visualizza" button.
# as an example see the following get
# showing the "normal" behaviour
driver.get(
    "https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07249&tipoSerie=serie_generale&tipoVigenza=originario"
) # get a law page
time.sleep(
    5
)  #  as you can see we are now on the final desired full page WITHOUT the Visualizza button
# hence the following code, identical to that above will fail and timeout
bottoni = WebDriverWait(driver, 10).until(
    EC.visibility_of_all_elements_located(
        (By.XPATH, '//*[@id="corpo_export"]/div/input[1]')
    )
)
time.sleep(5)  # just to see the result
bottoni[0].click()  # and this shows the desired final webpage

# and the program abends with the following message
#  File "/Users/bob/Documents/work/scraper/scrape_gu/temp.py", line 33, in <module>
#    bottoni = WebDriverWait(driver, 10).until(
#  File "/Users/bob/opt/miniconda3/envs/scraping/lib/python3.8/site-packages/selenium/webdriver/support/wait.py", line 80, in until
#    raise TimeoutException(message, screen, stacktrace)
#  selenium.common.exceptions.TimeoutException: Message:

uj5u.com热心网友回复：

用 atry和except块捕捉例外- 如果没有按钮直接提取文本 -处理例外

...
urls = [
    'https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07300&tipoSerie=serie_generale&tipoVigenza=originario',
    'https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07249&tipoSerie=serie_generale&tipoVigenza=originario'
   ]


data = []

for url in urls:
    driver.get(url)
    try:
        bottoni = WebDriverWait(driver,1).until(
            EC.element_to_be_clickable(
                (By.XPATH, '//input[@value="Visualizza"]')
            )
        )
        bottoni.click()
    except TimeoutException:
        print('no bottoni -')

    finally:
        data.append(driver.find_element(By.XPATH, '//body').text)

driver.close()
print(data)
...