- Added venv instruction + requirements.txt
- Added data folder structure with .gitkeep - Added .gitignore - Added load.py to load IMDB dataset and preview with D-Tale
This commit is contained in:
43
README.md
43
README.md
@@ -1 +1,42 @@
|
||||
# datamining_881
|
||||
# CSE 881: Data Mining - Course Project
|
||||
|
||||
## Setup
|
||||
|
||||
Note: If you're using a IDE like PyCharm, step 1-2 may be done automatically.
|
||||
|
||||
1. Create a virtual environment:
|
||||
```
|
||||
python -m venv venv
|
||||
````
|
||||
|
||||
2. Activate the virtual environment:
|
||||
```
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
3. Install dependencies:
|
||||
```
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
4. Transfer Datasets - Raw data is too large to include in GitHub. Download the dataset from:
|
||||
```
|
||||
https://developer.imdb.com/non-commercial-datasets/
|
||||
```
|
||||
Unzip the `.tsv` files and place them in:
|
||||
```
|
||||
data/raw/imdb_dataset
|
||||
```
|
||||
Files to Place:
|
||||
```
|
||||
name.basics.tsv
|
||||
title.akas.tsv
|
||||
title.basics.tsv
|
||||
title.crew.tsv
|
||||
title.episode.tsv
|
||||
title.principals.tsv
|
||||
title.ratings.tsv
|
||||
```
|
||||
|
||||
## Citations:
|
||||
https://developer.imdb.com/non-commercial-datasets/
|
||||
|
||||
Reference in New Issue
Block a user