frombs4importBeautifulSoup |
importrequests |
importre |
# Download IMDB's Top 250 data |
url='http://www.imdb.com/chart/top' |
response=requests.get(url) |
soup=BeautifulSoup(response.text, 'lxml') |
movies=soup.select('td.titleColumn') |
links= [a.attrs.get('href') forainsoup.select('td.titleColumn a')] |
crew= [a.attrs.get('title') forainsoup.select('td.titleColumn a')] |
ratings= [b.attrs.get('data-value') forbinsoup.select('td.posterColumn span[name=ir]')] |
votes= [b.attrs.get('data-value') forbinsoup.select('td.ratingColumn strong')] |
imdb= [] |
# Store each item into dictionary (data), then put those into a list (imdb) |
forindexinrange(0, len(movies)): |
# Seperate movie into: 'place', 'title', 'year' |
movie_string=movies[index].get_text() |
movie= (' '.join(movie_string.split()).replace('.', ')) |
movie_title=movie[len(str(index))+1:-7] |
year=re.search('((.*?))', movie_string).group(1) |
place=movie[:len(str(index))-(len(movie))] |
data= {'movie_title': movie_title, |
'year': year, |
'place': place, |
'star_cast': crew[index], |
'rating': ratings[index], |
'vote': votes[index], |
'link': links[index]} |
imdb.append(data) |
foriteminimdb: |
print(item['place'], '-', item['movie_title'], '('+item['year']+') -', 'Starring:', item['star_cast']) |
We will be (again) building an IMDB Scraper but this time with Nodejs. CAUTION: Web scraping stands on the border of both legal and illegal actions as web scraping is the technique to extract data from a website (a data may be under a copyright). This blog thus is just for education purpose and we are not using the scraped data for any other. Web Scraping - IMDb, Wiki¶ This is written to collect data for my friend's translation project, in which she attempts to analyse the differences between the official movie title translations in China, Hong Kong and Taiwan. Learnings:¶ find and findall only works on bs4.BeautifulSoup or bs4.element.Tags. In this blog, we take a look at how web scraping IMDB data is done using Python. On top of various data points that are updated for both movies and small screen shows, IMDB also allows its users to add ratings and these ratings have formed the basis of multiple lists that are used by movie buffs and others to create their watch lists.