코드 설명 좀 부탁드립니다!

while idx < news_num:#현재 뉴스의 번호가 원하는 뉴스 수보다 작은 동안 코드를 반복

table = soup.find('ul',{'class' : 'list_news'}) 
li_list = table.find_all('li', {'id': re.compile('sp_nws.*')})
area_list = [li.find('div', {'class' : 'news_area'}) for li in li_list] 
a_list = [area.find('a', {'class' : 'news_tit'}) for area in area_list]

## for n in a_list[:min(len(a_list), news_num-idx)]:
news_dict[idx] = {'title' : n.get('title')}
idx += 1

cur_page += 1

pages = soup.find('div', {'class' : 'sc_page_inner'})

# #next_page_url = [p for p in pages.find_all('a') if p.text == str(cur_page)][0].get('href')

req = requests.get('https://search.naver.com/search.naver' + next_page_url)
soup = bs(req.text, 'html.parser')

print('크롤링 완료')

print('데이터프레임 변환')
news_df = DataFrame(news_dict).T

folder_path = os.getcwd()
news_df.to_csv('giho.txt',sep = '\t', index = False )

print('텍스트 저장 완료')

웹크롤링을 하는 한 학생입니다. 구글링해서 찾은 코드를 나름대로 수정해서 크롤링 코드를 작성했는데 저 부분들이 반복되어서 다음 페이지로 넘어가는 것 까지는 알겠는데 구체적인 사항에 대해서는 잘 이해가 되지 않습니다ㅠㅠ
마크다운 한 부분이 어떤 원리로 실행되는지 알려주실 수 있으신가요??

pof1423 님 408

2021년 8월 24일 4:17 오후

댓 글 (0) |

목록으로