- Added data folder structure with .gitkeep - Added .gitignore - Added load.py to load IMDB dataset and preview with D-Tale
768 B
768 B
CSE 881: Data Mining - Course Project
Setup
Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
https://developer.imdb.com/non-commercial-datasets/
Unzip the .tsv files and place them in:
data/raw/imdb_dataset
Files to Place:
name.basics.tsv
title.akas.tsv
title.basics.tsv
title.crew.tsv
title.episode.tsv
title.principals.tsv
title.ratings.tsv