Python project scraping imdb and web application implemented using Flask.

geogas Last update: Jan 27, 2024

Project Description

That is a simple Python project illustrating the use of the following:

Scrapy (scraping and crawling framework)
Flask (micro web development framework based on Werkzeug)

The project is split up into two subprojects located in the respective folders. We firstly scrape the Internet Movie Database (imdb) with the aim to get information for movies we are interesting in. This information is persistenly stored in the mongodb database. Given that a movie can be represented as a document, mongodb was considered the best match for that use case. The second subproject corresponds to a web application being responsible for rendering the data we gathered from imdb.

Screenshots

Installation

If you have Vagrant installed you can simply run vagrant up to get a running environment.

To manually install the prerequisites on a ubuntu/debian system you can type the following in your shell.

# install mongo and python 
sudo apt-get install -y mongodb python-dev python-pip python-lxml
# install python packages
sudo pip install -r requirements.txt
# create mongo index for speeding up queries
mongo scripts/create_index.js

Components

###scrapy_imdb Location: scrapy_imdb

Goal of our scraping application is to fetch information about movies. For example: name, rating, genre, cast, etc. We specify a url that corresponds to a list assembled by imdb itself, or by a user. E.g. top-250 movies (http://www.imdb.com/chart/top). Then the scrapy spider parses this list and for every movie existing there it acquires information. This information is later being stored to imdb.movies collection of mongodb database by the implemented pipeline.

###flask_imdb Location: flask_imdb

A web application was implemented to present the aforementioned movie related information in a human friendly manner. This application is backed up by a server provided by the flask framework. Server listens for user requests and dispatces these requests to the corresponding views. A sidebar allowing for predefined queries exists. The user can also issue a request to the server by typing a movie's name (or part of it, a rating (1-10), a desired genre (e.g. crime), or a specific year.

Filling out mongodb collection

cd scrappy\_flask\_imdb/scrappy\_imdb
scrapy crawl imdb

This opetation will take some time and after its execution a number of movies will exist in the movies collection of the imdb mongodb.

Starting the flask server

Once spider and pipeline have completed, the server can be started and content can be served to the user via the web browser. In order to start the server simply type:

cd scrappy\_flask\_imdb/flask\_imdb/
python manage.py runserver

Check web page

Open your preferred browser and type in the location bar: http://localhost:5000/index

Cleanup

Execute the following commands for dropping the movies collection:

mongo imdb --eval "db.movies.drop()"

For dropping the whole imdb database please execute:

mongo imdb --eval "db.dropDatabase()"

Tags:

Python project scraping imdb and web application implemented using Flask.

Project Description

Screenshots

Installation

Components

Filling out mongodb collection

Starting the flask server

Check web page

Cleanup

Web scraping the popular job listing site "Glassdoor" with Python and BeautifulSoup. Implemented from scratch.

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

Automatically scrape the web data of people profiles on Linkedin based on a specific search query

Technical and sentiment analysis to predict the stock market with machine learning models based on historical time series data and news article sentiment collected using APIs and web scraping.

🚀 Web scraping for humans

Data collection in Python. Web Scraping with Beautiful Soup, Selenium and Scrapy

✨ Bose is a a feature-rich Python framework for Web Scraping and Bot Development. 🤖

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.