top of page
Writer's pictureAbhay Kulkarni

Meaningful training dataset

Supervised learning needs a training dataset to learn from. It is important to understand the nuances though. A meaningful training dataset should represent the data that is expected at the time of prediction. So ensure that the filters are correct and representative. For example, if you do not expect a certain label to be predicted, it might not make sense to have records for that label in your dataset. Label here is in context of the various values the predicted column can take. Time frame is another common point of discussion. Select a time frame that gives you enough records for training your models, but also ensure that the data truly represents your business. If your business environment has changed a few months back, and if the previous environment does not represent the current or future environments; one has to examine if it would be meaningful to train on that Part of past data. Let’s assume you set the correct filters and select the correct time frame. If you have much more data than you need, you might want to further reduce the data set. A huge training dataset opens the possibility of thicker models that do not perform well at run time. Reducing the dataset might need shortening the time frame or selecting samples. Whatever be your chosen path, do have a chat with your data scientist regarding the distribution of data. It should represent the original dataset, unless advised otherwise by the data scientist. Finally, ensure that the training dataset has good quality. What is a good quality training dataset? That might be another blog soon. For now, follow the simple guidelines provided in this blog, and you will be one step closer to practical AI. #abhayPracticalAI #artificialintelligence #AI

45 views0 comments

Recent Posts

See All

Prediction end point - streaming or batch?

We have discussed a few aspects of defining a prediction end point. Here is one more. Always work with your business users to understand...

Defining prediction API - some tips

The end result of a requirement gathering exercise for AI is often the end point definition for the desired prediction rule. Defining the...

Comentários


bottom of page