Facebook Lookalikes: Do They Look Like They Should?

In the 9 months that lookalike audiences have prevailed as one of the go-to targeting options for sophisticated Facebook advertisers, I’ve seen excellent performance by and large. However, I’ve also seen rather poor performance on a few occasions for reasons that have eluded me until only recently.

If you’re not familiar with lookalikes in Facebook, here’s how they work:

  1. First, you create a “custom audience” by securely uploading to Facebook a list of customer email addresses.* The more, the merrier.
  2. Second, you will select your new custom audience and click the button “Create Similar Audience” and choose the “Similarity” option.**

* You can also upload phone numbers and Facebook User IDs, but 9 times out of 10 you’ll use emails. And they don’t necessarily need to be customer emails; they can be your newsletter signups, or cart abandons, etc. If you have a list of emails that you procured legitimately, you’re ready to play.

** The similarity option results in an audience size of about 2M and more closely resembles the original custom audience. The greater reach option results in an audience size of over 10M and should definitely be tested.

Once you have a lookalike audience ready to go, here’s the fun part. Create a new ad where the only targeting you’ve specified is to include your lookalike audience.

Now, play around by overlaying various things such as gender, age, likes, broad categories and partner categories.

A Real Scenario

For example, I have generated a lookalike audience for one of my consumer electronics clients. The lookalike was generated from a list of paying customers from the month of November. As you can see below, Facebook’s audience estimator shows I have an audience of 1.4M people:


Next, because I know these products are more likely to be purchased by men than women, let’s check the gender mix by overlaying each gender separately. It should show roughly 70/30, male/female:



Uh-oh! Why is my male/female mix reversed 30/70? This goes against everything I know about this client. We better check the original custom audience of November purchasers for accuracy. We will do the same basic analysis, first getting the total, then overlaying male/female:




That’s more like it: 70/30 male/female is what we’d expect.

Let’s keep digging deeper by querying some age ranges for both lookalike and custom audiences. I expect the bulk of our actual customers to be 25-44 years old, as shown in the Custom column:


Once again, my lookalikes don’t look like my customers. My largest age group is now… teens?! What is happening? Could Facebook’s lookalike generator be so far off? Or is the audience estimator off?

In this example, I might be willing to send traffic to this lookalike audience by removing teens and females, which leaves me with a paltry 220,000 out of an available 1.4M, but where’s the scale in that? Isn’t creating scale the whole point of lookalikes?

To shed some light, I asked Ben Savage, a software engineer at Facebook, for his insights into what might be happening here:

“When Facebook creates lookalike audiences from a custom audience, all kinds of features are considered. Age, sex, and location are factored in, but so are other things like likes, and interests. The automatic algorithm which creates the lookalike audience attempts to find common patterns among the audience, and age/sex distributions are not necessarily the strongest correlation. As such, there is no certainty that the lookalike audience and custom audience will have the same composition of demographics.”

What surprises me is how little age and sex can factor into the lookalike algorithm; I would think they should play a bigger role.

Gauging Impact

Let’s shift the conversation to how these anomalies impact performance.

For another client in the education space, I ran a test during the first fifteen days of August where I targeted July Lookalikes (lookalikes generated from customers acquired in July). I followed up with a second test during the first fifteen days of September, this time targeting August Lookalikes. For both tests, I used the same page post creative. To recap the test: July lookalikes vs August lookalikes, identical creative, 15 day duration in each case.

Before sharing the results, let’s point out the lookalike anomaly: July showed 98% female; August showed 68% female, 32% male. That’s a big difference month over month for a client whose audience is almost entirely women, not men.



Remember – this client sells most successfully – by far – into the mom audience – females.  The July lookalike audience female/male ratio was consistent with the custom audience of actual paying customers, whereas the August lookalikes had a much higher ratio of males (inconsistent with the original August custom audience).  As it turned out, CPA for August Lookalikes was 43% higher than July’s CPA, and CPCs weren’t the cause:


If I were cynical I might hypothesize that Facebook is pulling the wool over my eyes and including audience segments that are less-in-demand than the actual audience segments that are represented by our original custom audience. This would be akin to Google AdWords exercising extreme indiscretion when matching broad-match keywords to user queries. I am not actually cynical, though, and I think that the similar audience functionality may just be a little glitch.

Regardless of your opinion as to the cause – at the end of the day, it will pay to know your lookalikes. They are not all created equally!

What do you think? Is there a glitch in the lookalike generator or a glitch in the audience estimator?