Apr 15, 20202 min read

Meaningful training dataset

Supervised learning needs a training dataset to learn from. It is important to understand the nuances though. A meaningful training dataset should represent the data that is expected at the time of prediction. So ensure that the filters are correct and representative. For example, if you do not expect a certain label to be predicted, it might not make sense to have records for that label in your dataset. Label here is in context of the various values the predicted column can take. Time frame is another common point of discussion. Select a time frame that gives you enough records for training your models, but also ensure that the data truly represents your business. If your business environment has changed a few months back, and if the previous environment does not represent the current or future environments; one has to examine if it would be meaningful to train on that Part of past data. Let’s assume you set the correct filters and select the correct time frame. If you have much more data than you need, you might want to further reduce the data set. A huge training dataset opens the possibility of thicker models that do not perform well at run time. Reducing the dataset might need shortening the time frame or selecting samples. Whatever be your chosen path, do have a chat with your data scientist regarding the distribution of data. It should represent the original dataset, unless advised otherwise by the data scientist. Finally, ensure that the training dataset has good quality. What is a good quality training dataset? That might be another blog soon. For now, follow the simple guidelines provided in this blog, and you will be one step closer to practical AI. #abhayPracticalAI #artificialintelligence #AI

Postingpad - Abhay's portfolio and blogs

Meaningful training dataset

Recent Posts

Comentários