Jackson's Machine Learning Fuel Predictions for Inside Sales


Team Members

Kyle Shultz

Adam Torek

Jonathan Porter


Abstract

Jackson’s posed business intelligence questions that revolved around seeing if fuel sales, fuel types, gallons of fuel sold correlated to a higher or lower volume of total inside sales, or if these features affected the different categories of Inside Sales more significantly the others. We were given 6 months of data that contained information about these features from 2021 that came from a subset of Jackson’s food stores. To answer these questions we developed machine learning models to make predictions on inside sales. Our team built a pipeline to clean the raw data given to us by Jackson's by removing unnecessary columns and focusing on the essential data to answer the questions posed. We further reduced this data to separate data frames (or tables) that represented only the data we would need to answer the questions. This data was parsed down to be grouped by store and day. The machine learning models were trained on the processed data to predict inside sales. We then compared our predictions with the remaining inside sales data, and data from a month after the 6-month subset, to see how accurate the predictions were. Using these methods we have found that there is a suggested positive correlation between a high fuel price and inside sales, Additionally, gallons sold and the (insert name for third fuel type), (insert name for fourth fuel type), were the best predictors for inside sales.

Project Description

What we built

Our team focused on making predictions with five different models: Support Vector Regressor, Support Vector Classifier, XGBoost, Random Forest, and a Neural Network. We used a variety of features including fuel types, fuel price, and gallons sold to predict volume of total inside sales. We also looked at predicting individual categories of inside sales such as alcohol, snacks, food services, etc.

How it works

Jackson's gave us three different types of CSVs including: transaction headers, transaction details, and franchise details. The first step of our process is a pre-processing pipeline where we removed unnecessary features from each of these types of tables that had no impact on inside sales. Then we merged the tables together into one data frame that we would use for our feature experiments and predictions. Then we organized the data to be by day and by store which was the primary perspective Jackson's wanted us to use for looking at the data set given. Each of us then built our models and started to train them on the relevant data. We did some feature experimentation to drill down into only the essential features of the data, for example, we have a data frame that just has Daily Gallons sold, Franchise ID, Business Date, and Total Inside sales. After running our experiment we determined what was the most valuable findings from each of us and developed scripts that could be given a set of data that was structured the same way the data given to us was structured, select a model, and give predictions as the output. This will be used by Jackson's Business Intelligence team to use our models on a future data set and see how accurate it is.

Screenshots

Feature Importance:

Predictions generated by Random Forest Classifier:

Here are the questions posed to us by Jackson's Business Intelligence teams and some graphs and calculations that display our findings:

Do fuel sales correlate to higher/lower inside sales?

Does the price of fuel correlate to higher/lower inside sales?

Does the price of fuel affect specific categories at a greater rate?

Is the fuel price an indicator for inside sales?

Are the gallons of fuel sold an indicator for inside sales?

Is one of the specific fuel types a better predictor/indicator of what is going on with inside sales?