4 min read

  • Vágólapra másolva

How to measure the expected level of demand? - Part 2

In the previous part of this article, we investigated the role of data in forecasting demand, and described our previous project where we built a model to forecast sales for nearly a month ahead. In today's article, we summarise the lessons we learned to see if it worked!

Hogyan mérjük fel a kereset várható mértékét? – 2. rész

Creating the model

Drawing on our past experience, we decided to use the Gradient Boosting Tree model, which had proven to be useful in several similar projects. The iterative “feature engineering” was also completed on a model of this type, and the optimisation of the hyperparameters was done once the available data set was considered final. As a result of this exercise, four different models were created to forecast demand on a daily basis for the next four weeks for all stores and all products.

Now, it may be a legitimate question to ask why exactly four models were used, as it appears logical that more models (e.g. daily or store level) should produce better results. Besides the fact that the results obtained did not justify the need for more modelling, there were also practical reasons for this. In addition to accuracy, an important aspect of evaluating a machine learning process integrated into an enterprise environment is the ease of operation and maintenance, which is simpler with fewer models.

Testing the models

The predicted product sales were obtained by cutting off the relevant last 28 days of the available data, setting it aside for testing, and then running the learned models on this test set. In each iteration cycle, we tested a fixed measure, the Root Mean Squared Error (RMSE). This metric was used to measure the performance of the given dataset, new descriptors (features), and model versions.

The success of the next successive iteration was always determined by how much it improved the RMSE.  We developed the features and the model on this basis until we arrived at a solution that could give reasonably good predictions.  

...and the results

As mentioned in the previous part of this article, the data were taken from a Kaggle competition, so unsurprisingly the metric we monitored most throughout the project was the ranking in the competition, although this was not one of our goals at all. It was worth following up, as our final result would have been the 27th place, which made us quite happy considering the circumstances. In any case, the key was definitely the accuracy of the forecast, where we also did well, although in some cases it was severely limited by the data available:

  • The model performed well, if the product was sold throughout the entire period, or a large part thereof. For new products, the prediction worked only to a very limited extent.

  • The model performed well if the number of items sold was relatively large. For products with daily sales between 0-5 items, the model gave a visibly flat average.

  • The sometimes incomprehensibly large peaks that could not be linked to a specific event or season, were not predicted accurately by the model, but the trend was nevertheless hit.

We treated the task as a regression problem, so we also found out what are the most important features that influence the prediction most:

  • item_id: The product identifier encoded as a number;

  • week: which week of the year;

  • mday: which day of the month;

  • lag_35: number of sales 35 days ago;

  • rmax_28_7: maximum number of sales between 28 and 35 days ago.

Our pilot model project was finally successful, and this success has confirmed our confidence in recommending demand forecasting based on machine learning algorithms to our manufacturing and commercial clients, provided that a time series of sufficient length is available and the demand for products is not minimal from period to period.

If your business needs a similar service, don't hesitate to contact us!

What business problem
can we help you solve?

Left hand art Right hand art

You may also like these