Our project focuses on developing a machine learning model to forecast sales using store attributes,
enhancing ExtraMile’s decision-making capabilities. The team aims to compare actual sales figures with both
the internal and machine learning-generated projections for each store. This analysis allows ExtraMile
leadership to identify stores with the largest variances, which can be targeted for intervention before the
end of the accounting period. This data-driven approach has the potential to optimize sales performance and
facilitate more effective resource allocation across stores.
ExtraMile supplied two datasets, which were utilized to investigate various models in order to identify the
one yielding the best results. Ultimately, a neural network emerged as the most effective model.
Additionally, the team developed a clustering method, enabling the company to predict the cluster a new
store would belong to, further enhancing its strategic planning capabilities.
In conclusion from our research, the attainment of high results was impeded by the scarcity of data
provided. Moreover, the problem was not strictly a time forecasting issue, which introduced additional
complexity to the task of accurately predicting sales. Regardless, the insights garnered from this project
may contribute to the foundation for future research and model refinement.
The team utilized Jupyter Notebook as the primary platform for the development of all machine learning
models. Python was selected as the sole programming language, primarily due to the availability of extensive
libraries tailored for machine learning applications. Distinct notebooks were created to perform specific
tasks. For instance, one notebook served as a data pipeline, processing the provided data files and
generating new features. This refined data was subsequently utilized by three additional notebooks, each
dedicated to the development of an individual model, namely linear regression, decision tree, and neural
network. These notebooks encompassed code for data processing, model construction, and model evaluation.
Throughout the model-building process, the team encountered numerous challenges, ranging from logistical
issues to the pursuit of high accuracy results. Jupyter Notebook's inherent incompatibility with Git
presented difficulties in merging code and resolving conflicts. However, the team successfully overcame this
obstacle by employing NBdime, a tool designed to address such logistical concerns. With regard to the
attainment of low accuracy results, several factors contributed to this outcome. The most prominent cause
was the limited volume of data provided, as the dataset consisted of information from only 50 stores over a
two-year period. For more precise sales forecasting based on store attributes, a substantially larger sample
size would be required.