# CSE 881: Data Mining - Course Project on Gitea ## Setup Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically. 1. Create a virtual environment: ``` python -m venv venv ```` 2. Activate the virtual environment: ``` source venv/bin/activate ``` 3. Install dependencies: ``` pip install -r requirements.txt ``` 4. Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from: ``` https://developer.imdb.com/non-commercial-datasets/ ``` Unzip the `.tsv` files and place them in: ``` data/raw/imdb_dataset ``` Files to Place: ``` name.basics.tsv title.akas.tsv title.basics.tsv title.crew.tsv title.episode.tsv title.principals.tsv title.ratings.tsv ``` 5. Run `load.py` 6. ## Citations: https://developer.imdb.com/non-commercial-datasets/ image