Files
datamining_881/README.md
2026-04-03 20:55:19 -04:00

46 lines
929 B
Markdown

# CSE 881: Data Mining - Course Project on Gitea
## Setup
Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.
1. Create a virtual environment:
```
python -m venv venv
````
2. Activate the virtual environment:
```
source venv/bin/activate
```
3. Install dependencies:
```
pip install -r requirements.txt
```
4. Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
```
https://developer.imdb.com/non-commercial-datasets/
```
Unzip the `.tsv` files and place them in:
```
data/raw/imdb_dataset
```
Files to Place:
```
name.basics.tsv
title.akas.tsv
title.basics.tsv
title.crew.tsv
title.episode.tsv
title.principals.tsv
title.ratings.tsv
```
5. Run `load.py`
6.
## Citations:
https://developer.imdb.com/non-commercial-datasets/
<img width="498" height="507" alt="image" src="https://github.com/user-attachments/assets/bbda6c5e-85ba-4f49-8778-916960bba302" />