My previous post was about selecting a meaningful training dataset. How do you know that the dataset once selected is good? Let me offer two considerations: Quality and Appropriateness. Quality, in my opinion, relates to correctness of data - both the ground truth (label) as well input features. e.g. If you are building a model for predicting product category, do check if the category field is being correctly captured. May be the category is set by an individual; but if wrongly set, somehow finds it’s way to the correct customer agent. Now if the agent does not follow a practice of resetting the category to the correct one, you have wrong ground truth while training your model. Another aspect is appropriateness. Let’s say there are two fields on your current lead scoring form. One which is filled based on some business rules when the lead first came in (perceived score), and another which is filled in at the time of closing the lead based on reality (actual score). If you are asked to build a predictor for lead score, what is the appropriate ground truth - The score that business rules predicted, or the score that was finally assigned? Answer really depends on the business need. Keeping “goodness” of training data in mind will take you one step closer towards practical AI. #abhayPracticalAI#ArtificialIntelligence#AI
top of page
Search
Recent Posts
See AllWe have discussed a few aspects of defining a prediction end point. Here is one more. Always work with your business users to understand...
160
While defining the prediction rule end point, I have often seen the spec designed such that the data science behind it becomes logic...
180
The end result of a requirement gathering exercise for AI is often the end point definition for the desired prediction rule. Defining the...
100
bottom of page
Comments