- Added venv instruction + requirements.txt

- Added data folder structure with .gitkeep - Added .gitignore - Added load.py to load IMDB dataset and preview with D-Tale
2026-02-03 22:21:41 -05:00
parent c18b412867
commit 2d2ee64c0e
7 changed files with 279 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1 +1,42 @@
-# datamining_881
+# CSE 881: Data Mining - Course Project
+
+## Setup
+
+Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.
+
+1. Create a virtual environment:
+```
+python -m venv venv
+````
+
+2. Activate the virtual environment:
+```
+source venv/bin/activate
+```
+
+3. Install dependencies:
+```
+pip install -r requirements.txt
+```
+
+4. Transfer Datasets - Raw data is too large to include in GitHub.  Download the dataset from:
+```
+https://developer.imdb.com/non-commercial-datasets/
+```
+Unzip the `.tsv` files and place them in:
+```
+data/raw/imdb_dataset
+```
+Files to Place:
+```
+name.basics.tsv
+title.akas.tsv
+title.basics.tsv
+title.crew.tsv
+title.episode.tsv
+title.principals.tsv
+title.ratings.tsv
+```
+
+## Citations:
+https://developer.imdb.com/non-commercial-datasets/