CSE 881: Data Mining - Course Project on Gitea

Setup

Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.

  1. Create a virtual environment:
python -m venv venv
  1. Activate the virtual environment:
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
https://developer.imdb.com/non-commercial-datasets/

Unzip the .tsv files and place them in:

data/raw/imdb_dataset

Files to Place:

name.basics.tsv
title.akas.tsv
title.basics.tsv
title.crew.tsv
title.episode.tsv
title.principals.tsv
title.ratings.tsv
  1. Run the pipeline in order:

Citations:

https://developer.imdb.com/non-commercial-datasets/

image
Description
No description provided
Readme 2 GiB
Languages
HTML 93.6%
Python 4.1%
TypeScript 1.9%
JavaScript 0.3%
CSS 0.1%