Web scraping images with beautiful soup

imdb.py
Web Scraping Imdb
frombs4importBeautifulSoup
importrequests
importre
# Download IMDB's Top 250 data
url='http://www.imdb.com/chart/top'
response=requests.get(url)
soup=BeautifulSoup(response.text, 'lxml')
movies=soup.select('td.titleColumn')
links= [a.attrs.get('href') forainsoup.select('td.titleColumn a')]
crew= [a.attrs.get('title') forainsoup.select('td.titleColumn a')]
ratings= [b.attrs.get('data-value') forbinsoup.select('td.posterColumn span[name=ir]')]
votes= [b.attrs.get('data-value') forbinsoup.select('td.ratingColumn strong')]
imdb= []
# Store each item into dictionary (data), then put those into a list (imdb)
forindexinrange(0, len(movies)):
# Seperate movie into: 'place', 'title', 'year'
movie_string=movies[index].get_text()
movie= (' '.join(movie_string.split()).replace('.', '))
movie_title=movie[len(str(index))+1:-7]
year=re.search('((.*?))', movie_string).group(1)
place=movie[:len(str(index))-(len(movie))]
data= {'movie_title': movie_title,
'year': year,
'place': place,
'star_cast': crew[index],
'rating': ratings[index],
'vote': votes[index],
'link': links[index]}
imdb.append(data)
foriteminimdb:
print(item['place'], '-', item['movie_title'], '('+item['year']+') -', 'Starring:', item['star_cast'])
ScrapingWeb Scraping Imdb

commented Jan 5, 2018

We will be (again) building an IMDB Scraper but this time with Nodejs. CAUTION: Web scraping stands on the border of both legal and illegal actions as web scraping is the technique to extract data from a website (a data may be under a copyright). This blog thus is just for education purpose and we are not using the scraped data for any other. Web Scraping - IMDb, Wiki¶ This is written to collect data for my friend's translation project, in which she attempts to analyse the differences between the official movie title translations in China, Hong Kong and Taiwan. Learnings:¶ find and findall only works on bs4.BeautifulSoup or bs4.element.Tags. In this blog, we take a look at how web scraping IMDB data is done using Python. On top of various data points that are updated for both movies and small screen shows, IMDB also allows its users to add ratings and these ratings have formed the basis of multiple lists that are used by movie buffs and others to create their watch lists.

Web Scraping Imdb Free

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment