Files
datamining_881/README.md
prabhaavp 2d2ee64c0e - Added venv instruction + requirements.txt
- Added data folder structure with .gitkeep
- Added .gitignore
- Added load.py to load IMDB dataset and preview with D-Tale
2026-02-03 22:21:41 -05:00

43 lines
768 B
Markdown

# CSE 881: Data Mining - Course Project
## Setup
Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.
1. Create a virtual environment:
```
python -m venv venv
````
2. Activate the virtual environment:
```
source venv/bin/activate
```
3. Install dependencies:
```
pip install -r requirements.txt
```
4. Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
```
https://developer.imdb.com/non-commercial-datasets/
```
Unzip the `.tsv` files and place them in:
```
data/raw/imdb_dataset
```
Files to Place:
```
name.basics.tsv
title.akas.tsv
title.basics.tsv
title.crew.tsv
title.episode.tsv
title.principals.tsv
title.ratings.tsv
```
## Citations:
https://developer.imdb.com/non-commercial-datasets/