Files
datamining_881/README.md
prabhaavp 2d2ee64c0e - Added venv instruction + requirements.txt
- Added data folder structure with .gitkeep
- Added .gitignore
- Added load.py to load IMDB dataset and preview with D-Tale
2026-02-03 22:21:41 -05:00

768 B

CSE 881: Data Mining - Course Project

Setup

Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.

  1. Create a virtual environment:
python -m venv venv
  1. Activate the virtual environment:
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
https://developer.imdb.com/non-commercial-datasets/

Unzip the .tsv files and place them in:

data/raw/imdb_dataset

Files to Place:

name.basics.tsv
title.akas.tsv
title.basics.tsv
title.crew.tsv
title.episode.tsv
title.principals.tsv
title.ratings.tsv

Citations:

https://developer.imdb.com/non-commercial-datasets/