Kyle Shultz
Adam Torek
Jonathan Porter
Jackson’s posed business intelligence questions that revolved around seeing if fuel sales, fuel types, gallons of fuel sold correlated to a higher or lower volume of total inside sales, or if these features affected the different categories of Inside Sales more significantly the others. We were given 6 months of data that contained information about these features from 2021 that came from a subset of Jackson’s food stores. To answer these questions we developed machine learning models to make predictions on inside sales. Our team built a pipeline to clean the raw data given to us by Jackson's by removing unnecessary columns and focusing on the essential data to answer the questions posed. We further reduced this data to separate data frames (or tables) that represented only the data we would need to answer the questions. This data was parsed down to be grouped by store and day. The machine learning models were trained on the processed data to predict inside sales. We then compared our predictions with the remaining inside sales data, and data from a month after the 6-month subset, to see how accurate the predictions were. Using these methods we have found that there is a suggested positive correlation between a high fuel price and inside sales, Additionally, gallons sold and the (insert name for third fuel type), (insert name for fourth fuel type), were the best predictors for inside sales.
Our team focused on making predictions with five different models: Support Vector Regressor, Support Vector Classifier, XGBoost, Random Forest, and a Neural Network. We used a variety of features including fuel types, fuel price, and gallons sold to predict volume of total inside sales. We also looked at predicting individual categories of inside sales such as alcohol, snacks, food services, etc.
Jackson's gave us three different types of CSVs including: transaction headers, transaction details, and franchise details. The first step of our process is a pre-processing pipeline where we removed unnecessary features from each of these types of tables that had no impact on inside sales. Then we merged the tables together into one data frame that we would use for our feature experiments and predictions. Then we organized the data to be by day and by store which was the primary perspective Jackson's wanted us to use for looking at the data set given. Each of us then built our models and started to train them on the relevant data. We did some feature experimentation to drill down into only the essential features of the data, for example, we have a data frame that just has Daily Gallons sold, Franchise ID, Business Date, and Total Inside sales. After running our experiment we determined what was the most valuable findings from each of us and developed scripts that could be given a set of data that was structured the same way the data given to us was structured, select a model, and give predictions as the output. This will be used by Jackson's Business Intelligence team to use our models on a future data set and see how accurate it is.
Feature Importance:
Predictions generated by Random Forest Classifier:
Here are the questions posed to us by Jackson's Business Intelligence teams and some graphs and calculations that display our findings: