Setting Einstein Prediction Builder On An Open Data Set

3 min readJul 16, 2020

As soon as it was announced, last November at Dreamforce 2019, that we would all receiving a single Einstein Prediction, our minds began to tick over about the single vital business question we would like to ask our data.

For those who haven’t come across Einstein Predictions before, it analyses your data on a single object in order to give you either;

a) likelihood of a given question being yes or no, or

b) a predictive value for something such as a currency field.

Asking the question is the simple aspect of setting up a prediction, what you’re asking the question about can actually be trickier.

Closed Salesforce Data

If you’ve taken the Trailhead module on this subject, you will have been presented with a prediction scenario around a nice closed data set — past invoices — and the question being asked is true or false, will an invoice will be paid on time.

The closed data set are all the historical closed invoices. From that data set, Einstein looks at the true and false, before applying whatever it has learnt to the rest of the data — the open invoices.

Einstein Predictions are really simple in this regard and there are plenty of simple and proven use cases — like Opportunity Scoring and Lead Scoring.

Open Data

However, what if you’re looking at something where your closed data set is also 100% of your data? Think about a situation like attrition — your data set is either going to a customer or a past customer. You cannot use this in Einstein Predictions as you will not have any data left over to apply the prediction to.

Think of it as tasting your food as you cook it to see if it needs any seasoning — if you eat it all, you’ll know whether it tastes good or not but you won’t have anything left to serve!

In order to find a data set you can use, you need to be creative. Rather than finding a subset of closed data, you simply need to put a rule across all of your data. There are a few ways you could look at randomising a data set — you could pull every record which has a created day of ‘1’, or every account has a name beginning with ‘U’. While valid, you do run the risk of introducing some bias into your prediction, so another example would be to put an auto-number on your object and filter the records with a criteria of “auto-number ends in 0”.

You can then pose your question across all this randomised open data and see if you’ve managed to capture enough true and false results for the prediction to work without harvesting too much data to apply it to.

Prediction Builder vs. Insights

There are some architectural considerations around the question you are posing. Attrition sounds like a natural fit, however you have to remember you can only apply Einstein Predictions to a single object. If you’re finding that you’re bending over backwards to roll up data into your object in order to have enough for Einstein to detect any trends, then you should be looking at Einstein Analytics Insights rather than Prediction Builder.

Setting Einstein Prediction Builder On An Open Data Set

Written by Simon Whight