It’s simple. Prediction rules learn from past data. If your data is bad, so will your prediction rule be. What then makes data bad? A few things to consider. Think both data and metadata (metadata as understood by the database fraternity). Some considerations: from the meta data perspective, are we pointing to the correct fields? For example, if you want to train a model for scoring Leads, are you pointing to the correct field for ground truth. May be there is an expected lead score and realized lead score in your training data. Which of the two would you select for training your model? There is no clear answer; it depends on what you want from the model. The message here is that you need to point your training data set to the correct metadata. The same is true for input variables (features) and filters. Talk to business users, folks intimate with the process and daily users of the system before you decide on the metadata. The second step (after selecting the right metadata) is ensuring good data. This problem can manifest in many forms. You could have bad data because of data carried over from software upgrades, or may be just bad data governance practices. These are just a few causes of bad data. So ensure the sanctity of data. Then there is also the aspect of suitability of data. If your data set contains records from machine as well as human generated data, do you need all of it? May be you need only the human data. Depends on your use case. The point here is to ensure that you have the right metadata and, good and suitable data. Delay the implementation if needed, in order to institute a process for data governance To get good training data. Once you have made the right metadata & data choices, you have taken one more step towards practical AI. #abhayPracticalAI #artificialintelligence #ai #machinelearning
It’s all about good data!
Updated: Apr 4, 2020
Comentários