2026-03-12 12:16:51 -04:00
2026-03-12 12:11:37 -04:00
2026-02-11 17:55:24 -05:00
2026-03-12 12:16:51 -04:00
2026-02-03 22:25:28 -05:00
2026-03-12 12:11:37 -04:00
2026-03-12 12:11:37 -04:00

CSE 881: Data Mining - Course Project

Setup

Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.

  1. Create a virtual environment:
python -m venv venv
  1. Activate the virtual environment:
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
https://developer.imdb.com/non-commercial-datasets/

Unzip the .tsv files and place them in:

data/raw/imdb_dataset

Files to Place:

name.basics.tsv
title.akas.tsv
title.basics.tsv
title.crew.tsv
title.episode.tsv
title.principals.tsv
title.ratings.tsv
  1. Run load.py

Citations:

https://developer.imdb.com/non-commercial-datasets/

image
Description
No description provided
Readme 2 GiB
Languages
HTML 93.6%
Python 4.1%
TypeScript 1.9%
JavaScript 0.3%
CSS 0.1%