PumpGym only shows the current capacity of the gym, but does not show the quietist or busiest hours throughout the day.
Create an application which would be able to predict the capacity of the gym at any given hour of the day to determine times to go to/avoid the gym.
The solution was made of two scripts:
This component would get the capacity image from the website, apply some filters to the image and then use Tesseract to convert the text from the image to a string. The string would then be stored in a dated file alongside the current time.
From left to right: Original image, filtered image, result stored in file.
The filtering used to the original image was:
As for the arguments passed to Tesseract:
capacity = pytesseract.image_to_string(img, config='-c tessedit_char_whitelist=0123456789% --psm 7 --oem 2')
The Capacity Predictor is a Regression model which was trained using the data gathered by the Capacity Scraper. The two features of the model are the time and day of the week.
The main steps taken to create, pick and use a Regression model:
$ python3 predict_gym_capacity.py --predict 18 monday
37
The steps taken are inspired by "Chapter 2. End-to-End Machine Learning Project" in Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. Would highly recommend having a look through this book and in particular the second chapter, to get an idea of the main steps involved for a ML project.