c5d1ff3ab43fb10167379fa4c30d244e307af802
CSE 881: Data Mining - Course Project
Setup
Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
https://developer.imdb.com/non-commercial-datasets/
Unzip the .tsv files and place them in:
data/raw/imdb_dataset
Files to Place:
name.basics.tsv
title.akas.tsv
title.basics.tsv
title.crew.tsv
title.episode.tsv
title.principals.tsv
title.ratings.tsv
- Run
load.py
Citations:
https://developer.imdb.com/non-commercial-datasets/
Description
Languages
HTML
93.6%
Python
4.1%
TypeScript
1.9%
JavaScript
0.3%
CSS
0.1%