Distracted Driver Day 1 — Getting the Data

Chiraag K V
3 min readJul 22, 2021

--

Hey there! Hope you are doing well. I started a project using Kaggle’s Distracted Driver dataset today and I am going to write down my daily progress in here.

I am using a Colab notebook as it is super-easy to setup and just like a Jupyter Notebook.

Getting the Data from Kaggle

I first tried to use the Kaggle API in the notebook as it was an efficient way of getting the data into the Colab environment. This didn’t work due to some path issues, so I downloaded the .zip file to my local computer and then uploaded it to Google Drive.

Getting the Data into folders

Then I needed the data unzipped, so I used the zipfile library to unzip and extract all of the files into separate folders.

At the end of this, my file tree was something like this:

Getting the file paths of the images and the corresponding labels

Getting the images

This was the fun part. Here, I could use flow_from_directory() from Keras (as it does the same thing better than my code and even pre-processes the images), but I thought of implementing this on my own. Although my code was nowhere as succinct as the mentioned function, it helped me build my own logic.

I first got the path to the main directory (train, as it was called in my case). Then, I made a list of all class names and merged them both to get the paths to the sub-folders.

the code which I used to do it

After this, I created a function that accepted a list of directories and gave the path to every single image in the directories. This gave me all of the images in the train set.

this function gets all of the images

Getting the Labels

After getting the file paths to the images, getting their labels was easy.

I created a function that takes in the whole path and the base path (whole path is “/content/imgs/train/c0/img_4037.jpg” and base path is “/content/imgs/train/” ) and removes the base path from the whole path. This will result in: c0/img_4037.jpg. Now, I found out the index of “/” and sliced the string till that index. This resulted in “c0”.

this function gets the labels

Functionalizing this process

For future use, I made the whole process into a function. it is essentially the same process in the same order, but in a neat way, which can be reused. it

this function does the whole process in one go

When we run this function, we get this

Conclusion

Wonderful! Now we have our labels and images lined up. Tomorrow, I will be pre-processing the data and getting it into a form which I can use for modelling.

Hope you had fun reading this blog! Bye!

--

--

Chiraag K V
Chiraag K V

Written by Chiraag K V

Programming enthusiast, bibliophile

No responses yet