我正在尝试从 vivino.com 收集数据,但 DataFrame 显示为空,我可以看到我的汤正在收集网站信息,但看不到我的错误在哪里。
我的代码:
def get_data():
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
r = requests.get("https://www.vivino.com/explore?e=eJzLLbI1VMvNzLM1UMtNrLA1NTBQS660DQhRS7Z1DQ1SKwDKpqfZliUWZaaWJOao5SfZFhRlJqeq5dsmFierlZdExwJVJFcWA-mCEgC1YxlZ", headers=headers)#, proxies=proxies)
content = r.content
soup = BeautifulSoup(content, "html.parser")
因为我需要酿酒商、酒名和评级,所以我是这样尝试的:
alls = []
for d in soup.findAll('div', attrs={'class':'explorerCard__titleColumn--28kWX'}):
Winery = d.find_all("a", attrs={"class":"VintageTitle_winery--2YoIr"})
Wine = d.find_all("a", attrs={"class":"VintageTitle_wine--U7t9G"})
Rating = d.find_all("div", attrs={"class":"VivinoRatingWide_averageValue--1zL_5"})
num_Reviews = d.find_all("div", attrs={"class":"VivinoRatingWide__basedOn--s6y0t"})
Stars = d.find_all("div", attrs={"aria-label":"rating__rating--ZZb_x rating__vivino--1vGCy"})
alll=[]
if Winery is not None:
#print(n[0]["alt"])
alll.append(Winery.text)
else:
alll.append("unknown-winery")
if Wine is not None:
#print(wine.text)
alll.append(wine.text)
else:
alll.append("0")
if Rating is not None:
#print(rating.text)
alll.append(rating.text)
else:
alll.append("0")
...
然后将数据放入 DataFrame:
results = []
for i in range(1, no_pages+1):
results.append(get_data())
flatten = lambda l: [item for sublist in l for item in sublist]
df = pd.DataFrame(flatten(results),columns=['Winery','Wine','Rating','num_review', 'Stars'])
df.to_csv('redwines.csv', index=False, encoding='utf-8')
您的数据很可能在某些 JavaScript 代码后面;幸运的是,数据以 JSON 文件的形式提供。我检查了
Network
选项卡并找到了它们。还有其他 JSON 文件;您可以检查浏览器的网络选项卡以访问它们。