BeautifulSoup-da veb-sahifani yuklamasligi kerak

Hozirda veb-sahifani olib tashlashda xato bo'lsa, sho'rva sahifa bilan to'ldirilmaydi, lekin aslida beautifulsoupdan qaytishni oladi.

Men buni tekshirish uchun qidiryapman, shuning uchun agar veb-sahifani olishda xatolik yuzaga kelsa, men kabi kodning bir qismini o'tkazib yuborishim mumkin

if soup:
  do stuff

Lekin men barchani birlashtirmoqchi emasman. Yangi so'rov uchun Approziyalar.

def getwebpage(address):
  try:
      user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
      headers = { 'User-Agent' : user_agent }
      req = urllib2.Request(address, None, headers)
      web_handle = urllib2.urlopen(req)
  except urllib2.HTTPError, e:
      error_desc = BaseHTTPServer.BaseHTTPRequestHandler.responses[e.code][0]
      appendlog('HTTP Error: ' + str(e.code) + ': ' + address)
      return
  except urllib2.URLError, e:
      appendlog('URL Error: ' + e.reason[1] + ': ' + address)
      return
  except:
      appendlog('Unknown Error: ' + address)
      return
  return web_handle


def test():
  soup = BeautifulSoup(getwebpage('http://doesnotexistblah.com/'))
  print soup

  if soup:
    do stuff

test()
1

2 javoblar

Bir funktsiyani urldan ma'lumotlarni uzatishning butun jarayonini o'z ichiga olgan kodni tuzing, va boshqa ma'lumotlarning ishlov berishini qamrab oladi:

import urllib2, httplib
from BeautifulSoup import BeautifulSoup

def append_log(message):
    print message

def get_web_page(address):
    try:
        user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
        headers = { 'User-Agent' : user_agent }
        request = urllib2.Request(address, None, headers)
        response = urllib2.urlopen(request, timeout=20)
        try:
            return response.read()
        finally:
            response.close()
    except urllib2.HTTPError as e:
        error_desc = httplib.responses.get(e.code, '')
        append_log('HTTP Error: ' + str(e.code) + ': ' +
                  error_desc + ': ' + address)
    except urllib2.URLError as e:
        append_log('URL Error: ' + e.reason[1] + ': ' + address)
    except Exception as e:
        append_log('Unknown Error: ' + str(e) + address)

def process_web_page(data):
    if data is not None:
        print BeautifulSoup(data)
    else:
        pass # do something else

data = get_web_page('http://doesnotexistblah.com/')
process_web_page(data)

data = get_web_page('http://docs.python.org/copyright.html')
process_web_page(data)
2
qo'shib qo'ydi
soup = getwebpage('http://doesnotexistblah.com/')
if soup is not None:
    soup = BeautifulSoup(soup)

Siz istagan narsami?

0
qo'shib qo'ydi
getwebpage dan faqatgina orqaga qaytsangiz None .
qo'shib qo'ydi muallif Chris Morgan, manba
Ha va yo'q, men shuni xohlayman, lekin sho'rva hech qachon yo'q, hatto yomon manzil bilan ovqatlanayotganda ham ...
qo'shib qo'ydi muallif Brad, manba
Python
Python
372 ishtirokchilar

Bu guruh python dasturlash tilini muhokama qilish uchun. Iltimos, o'zingizni hurmat qiling va faqat dasturlash bo'yicha yozing. Botlar mavzusini @botlarhaqida guruhida muhokama qling! FAQ: @PyFAQ Offtopic: @python_uz_offtopic

Python offtopic group !
Python offtopic group !
150 ishtirokchilar

@python_uz gruppasining offtop gruppasi. offtop bo'lsa ham reklama mumkin emas ) Boshqa dasturlash tiliga oid gruppalar @languages_programming