Files
datamining_881/README.md
prabhaavp 369f5ced89 Update README.md
Updated readme to include structure picture
2026-02-03 22:25:28 -05:00

46 lines
920 B
Markdown

# CSE 881: Data Mining - Course Project
## Setup
Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.
1. Create a virtual environment:
```
python -m venv venv
````
2. Activate the virtual environment:
```
source venv/bin/activate
```
3. Install dependencies:
```
pip install -r requirements.txt
```
4. Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
```
https://developer.imdb.com/non-commercial-datasets/
```
Unzip the `.tsv` files and place them in:
```
data/raw/imdb_dataset
```
Files to Place:
```
name.basics.tsv
title.akas.tsv
title.basics.tsv
title.crew.tsv
title.episode.tsv
title.principals.tsv
title.ratings.tsv
```
5. Run `load.py`
6.
## Citations:
https://developer.imdb.com/non-commercial-datasets/
<img width="498" height="507" alt="image" src="https://github.com/user-attachments/assets/bbda6c5e-85ba-4f49-8778-916960bba302" />