When Bots Attack

Someone bought my Twitter account 10K fake/bot followers, and here’s what I learned about Twitter’s spam detection tools

15 min readMar 28, 2018

Network graph representation of my Twitter followers (large grey dot is me, Geoff Golberg!). Each dot is one of my Twitter followers. Interactive version is fantastic when viewed on desktop (use touchpad/mouse to move around/zoom, hover over dots to reveal account, click to observe interconnectivity). Graph credit: Max Galka

Story First, Data Second

(If you prefer data first, tap/click here)

Unless you’ve been living under a rock, chances are you’ve read The New York Times’ investigative report, “The Follower Factory.” The piece takes a deep dive into buying fake/bot Twitter followers. I, too, have some experience with fake/bot Twitter followers. I have a story. I have some data.

I suppose a good place to start would be this tweet from Stanford professor, Johan Ugander — it is part of a tweetstorm connected to “The Follower Factory” and I would encourage you to read the thread in its entirety:

Writing a post about fake/bot Twitter accounts is something I have put off for far too long. I can relate to the hypothetical scenario Johan describes, as I was targeted in this fashion a couple of years ago. Discovering that tweet was the nudge I needed to finally tackle this post (thanks, Johan!).

In Jan 2016 I noticed my Twitter account was gaining followers at a ridiculous rate:

I am certain that my account was “maliciously [targeted by] someone with bot followers to make [me] look bad.”

Let me explain.

Attack Of The Bots

A few days prior to the surge in Twitter followers, I tweeted the following:

MeVee came out of nowhere, launching a live video app in early 2016, in what was a very hot sector at the time. MeVee’s Twitter following (15K+) was an immediate red flag for me — considering they had just launched — so I decided to manually review the account’s most recent Twitter followers. After encountering what appeared to be mainly fake/bot accounts, I ran the account through TwitterAudit. TwitterAudit most certainly isn’t perfect, but I have been impressed with their accuracy (more on that later). As I suspected, TwitterAudit revealed that the majority (93%) of MeVee’s Twitter followers were fake.

Why would a recently launched app purchase Twitter followers? The answer is simple: social proof (i.e. having more Twitter followers can impact the perception of MeVee).

To their credit, MeVee replied to my tweet rather than ignoring it:

MeVee’s Twitter account has changed its handle from @MeVeeApp to @buildwithcrane. The account is now associated with Crane AI, which has nothing to do with live video (MeVee no longer exists)

The response (“someone early on accidentally bought some followers”) gave me a good laugh. Several people named in The New York Times’ report similarly pointed the finger elsewhere (employees, family, agents, PR companies, friends) when confronted.

While I cannot definitively say it was someone with MeVee ties who targeted my account, what I can definitively say is that I did not purchase the followers. It very easily could have been a third party who came across our Twitter exchange and thought it would be funny to flood my account with fake/bot followers. In any case, the culprit’s identity isn’t central to the story/data I am sharing.

Ultimately anyone can purchase fake/bot Twitter followers to your Twitter account. Did you know that?

Whereas a few days earlier I was recommending tools to facilitate MeVee’s removal of fake/bot Twitter followers, I now found myself in a position where my Twitter account was being “maliciously [targeted by] someone with bot followers to make [me] look bad.”

At the time, I was working as a live video content creator/consultant. It was work that I truly enjoyed, especially when it involved travel, as was the case when I partnered with Heineken during the Rio Olympic games, for example:

Working as an “influencer” (I much prefer “creator”), I took great pride in ensuring that my Twitter following was clean/legit. In other words, I didn’t want my account to be followed by fake/bot Twitter accounts, which has the potential to be bad for business.

Savvier marketers are vetting/analyzing the audiences of the creators with whom they partner beyond simply looking at reach (i.e. number of followers). Most marketers, however, aren’t employing sophisticated processes to ensure they stay clear of partners who knowingly game the system via purchasing social followers/engagements. My desire to maintain a clean/legit Twitter following was driven by necessity. I didn’t want to lose out on any work because of the appearance that I was falsely representing my reach/influence.

Ever since I joined Twitter (March 2009), in fact — and well before I got into live video — I would regularly review my followers to ensure they were real accounts. As a result, I can confidently say that I probably have more Twitter accounts blocked than you! (~3.3K accounts, to be precise)

Influencer marketing is utterly broken, by the way, but I’ll save that for a future post.

Well, Shit. This Sucks

It didn’t take long for me to realize my Twitter account was (as far as I am concerned) being attacked by fake/bot accounts.

After becoming frustrated with blocking endless fake/bot accounts from following, I opted to change my settings to “Protect [my] Tweets.” When signing up for Twitter, tweets are public by default and anyone can follow any account. When tweets are protected, people must make a request to follow. Hence switching my account to private would alleviate the burden of blocking accounts.

Protecting tweets isn’t an ideal solution as tweets from private accounts can only be seen by followers. This limits visibility, and, by extension hinders engagement/interaction. Another downside to having a private account is that your tweets can no longer be retweeted. Twitter functions as a fantastic vehicle to amplify content (via discovery/retweets), however that value cannot be captured by private Twitter accounts. Being limited to utilizing Twitter within the framework of a private network vastly degrades both utility and user experience. Having a private account meant I could no longer leverage Twitter effectively.

Next, I pinged several Twitter employees to see if they could offer advice/solutions. The takeaway from those exchanges was this: don’t worry about the fake/bot followers, as Twitter regularly scrubs their ecosystem and fake/bot accounts will be removed, eventually. I also filed a ticket with Twitter’s Help Center, but didn’t receive a response (worth noting that neither a ticket number nor confirmation email were provided).

Soon thereafter I switched my account back to public, giving Twitter the benefit of the doubt. Over the next several weeks my Twitter account grew from ~4.6K followers to ~11.7K “followers” (Jan 7th 2016 through Jan 29th 2016):

More than two years later, I am still waiting for thousands of fake/bot accounts to be removed from Twitter/my follower list.

Enough With The Story, Let’s Get To The Data

(Take me back to the story)

As I will illustrate — and by applying multiple approaches/tools — it is relatively easy to identify fake/bot Twitter accounts. Moreover, contrary to popular belief, Twitter is actually quite effective at identifying spam accounts.

According to Nick Bilton (author of “Hatching Twitter”): “Twitter knew about all its fake followers, and always has — eliminating just enough bots to make it seem like they care, but not enough that it would affect the perceived number of active users on the platform.”

Upon closer inspection of what takes place under Twitter’s hood, it becomes apparent that Nick’s assertion perfectly describes Twitter’s approach to dealing with fake/bot accounts.

1) Network Graph Representation

@geoffgolberg’s followers (as of May ‘17)

Once again, this is a network graph representation of my Twitter followers. Each dot is one of my Twitter followers. Colors represent communities (determined by interconnectedness) and the size of each circle represents how central the follower/account is in the community.

Given this is my own Twitter network, it quickly becomes clear what each community represents. When doing similar analyses for other accounts — where one lacks the same level of familiarity — it can require a little more legwork.

The graphic above includes a legend to spare you the pain (personally, I enjoy this process!) of attempting to identify each community. What immediately stands out are the green clusters. Whereas the non-green clusters reflect significant connectivity between communities, the green clusters are, for the most part, disconnected from the rest of the communities. The fake/bot followers that were bought for my account comprise the vast majority of the green dots. A small portion of the green dots are real followers, simply accounts that aren’t connected to the rest of my Twitter followers (some high school friends, for example, show up in the green clusters).

Head over to the interactive version to search for your account and explore!

2) TwitterAudit.com

TwitterAudit was founded in 2012. For free, one can audit their own/other Twitter account(s). TwitterAudit takes a random sample of (up to) 5K of an account’s followers and then scores each of those followers. Their algorithm evaluates a bunch of variables (some of which include: number of tweets, date of last tweet, ratio of followers to following) and then determines whether each follower is real vs. fake. Their paid offering (PRO) allows one to run their account across larger numbers of their followers, rather than being limited by 5K, as is the case with the free offering.

An audit score of 98%, for example, means that TwitterAudit has determined 98% of the account’s followers to be real. Prior to commencing a round of blocking fake/bot followers in Sep 2015, I had a TwitterAudit score of 98% (78 fake followers out of ~4K followers):

In Feb 2018 — more than two years after my Twitter account was attacked by fake/bot accounts — TwitterAudit determined that ~4K of my followers were fake (70% audit score):

Here’s the growth of those fake followers, in relation to the growth of my overall Twitter followers:

@geoffgolberg’s followers (“fake” followers as determined by TwitterAudit, Feb ’18)

As previously mentioned, maintaining a clean/legit Twitter following has always been important to me. This is evidenced by the fact that the first fake follower identified by TwitterAudit was my 1,680th follower (in other words, I avoided fake followers during my first ~6 years as a Twitter user). Below is a table summarizing the TwitterAudit data:

3) Twitter API

The New York Times’ report employed a very clever tactic to identify fake/bot followers. Their approach involves plotting an account’s followers (first to most recent) against the date each respective (follower) account was created. The example below, courtesy of New York Times graphics editor, Rich Harris, does a great job illustrating patterns that signal fake/bot followers:

Credit Columbia professor, Mark Hansen, with the fingerprint discovery

Shortly after reading “The Follower Factory,” I came across a post from Elaine Ou, where she applies the same analysis to her own Twitter account. Elaine reviews the followers of New York Times columnist Paul Krugman (“for the sake of journalistic objectivity”) and Eric Schneiderman, The Attorney General of New York, as well (Schneiderman opened an investigation following The New York Times’ report). Elaine wrote a Python script to reproduce the New York Times style scatterplots, and was kind enough to link to it at the end of her post.

Here are the results of running Elaine’s script for my Twitter account:

@geoffgolberg’s followers (Feb ’18; excludes “suspected spam accounts”)

This is where things get interesting.

Despite having over 13K followers at the time, the script returned only ~9.4K followers (the fake/bot follower attack can be seen from ~3.6K to ~5.1K followers). I decided to search Twitter’s Help Center, and came across the “My follower count is wrong” section. A sentence that read “To see the full list of your followers, including suspected spam accounts, turn off the quality filter in your settings” caught my attention. Here’s more info on quality filter (also from Twitter’s Help Center):

Every Twitter account has quality filter (which launched in Aug ‘16) turned on by default. Translated: Twitter wants to hide accounts which they have identified as “suspected spam accounts” from your list of followers. If a user wants to view the full list of their followers, it requires turning off quality filter in their settings. This finding prompted me to tweet the following questions:

With quality filter now turned off, Elaine’s script still returned ~9.4K followers for my account. Her script is using the GET followers/list API call to obtain an account’s follower list. Alternatively, this can be done using the GET followers/ids API call.

The latter returned my full follower list, matching the number displayed on my Twitter profile (over 13K).

Here is the New York Times style scatterplot (reproduced using Excel) for my full follower list:

@geoffgolberg’s followers (Feb ’18; includes “suspected spam accounts”)

The attack comprised ~1.5K accounts in the first scatterplot, whereas here — which reflects the actual follower count displayed by Twitter — the attack picks up an additional ~4K accounts (the fake/bot follower attack can now be seen from ~3.7K to ~9.2K followers). Those ~4K accounts have been identified by Twitter themselves as “suspected spam accounts” — yet, for some reason, the accounts are neither suspended nor removed.

Next, I decided to compare the partial list of my followers (GET followers/list API call) with my full follower list (GET followers/ids API call). Here are the “suspected spam accounts” (i.e. not returned by the GET followers/list API call) expressed as a percentage of my full follower list (i.e. accounts returned by the GET followers/ids API call):

With the exception of the attack, Twitter’s API consistently returned ~1% of followers as accounts identified by Twitter as “suspected spam accounts” (each time period spans at least one year and across thousands of followers). During the month of Jan 2016, however, Twitter has flagged 73% of accounts that followed my account as “suspected spam accounts” (again, more than 4K accounts/followers). In other words, more than 4K of the 13.6K followers reflected in my Twitter profile are “suspected spam accounts” — at least, according to Twitter’s spam detection tools.

This is neither a bug nor isolated to my account. Twitter’s entire platform is propped up by misleading/inflated follower/following counts, which include accounts Twitter themselves have identified as “suspected spam accounts” (and have been identified as such for years).

I will discuss why that matters later in the post — first we’ll take a closer look at the data from the various approaches/tools.

Comparing The Three Approaches

For this section the analysis will focus on the attack time period (Jan 7th 2016 through Jan 29th 2016).

In the context of the attack, the network graph approach is the most accurate at identifying fake/bot followers. There are certainly many green dots/followers that are real accounts; however, it’s much more likely that those accounts followed outside of the attack dates.

Translated: the high school friends mentioned earlier on, for example, who are part of the green clusters, are there because they are disconnected from the rest of my communities, not because they are fake/bot accounts. They are much more likely to have followed in the first time period (Mar 2009 through Dec 2014) than 7 years after I joined Twitter— during the attack which lasted just a few weeks in Jan 2016. During the attack period, the disconnected green clusters tend to signal fake/bot accounts, rather than more broadly being accounts which are disconnected from the rest of my communities.

Before jumping into the data, here is a visualization of ~200 accounts which followed during the attack period (turn your audio on while viewing!). The first column is red when the network graph representation reflects being in the green clusters (i.e. fake/bot accounts). The second column is red when Twitter’s API reflects being a “suspected spam account.” The third column is red when TwitterAudit reflects being a fake account. Accounts that were suspended by Twitter (between June 2017 and Feb 2018) are orange, while accounts that were removed by Twitter (same time period) are grey:

Credit: Dollee Bhatia

Note the string of ~30 followers where all three approaches signal that accounts which followed are real. This happened on Jan 19th, 2016, after being nominated by the Shorty Awards for Periscoper of the Year (picking up those real followers, in succession, as a result):

The network graph approach identifies 97% of the accounts which followed during the attack (5,419 out of 5,583) to be fake/bot accounts (green dots). Between Jun 2017 and Feb 2018 (9 months), Twitter suspended just 50 of those accounts, while another 36 were removed.

Twitter’s API identifies 4,013 “suspected spam accounts” which followed during the attack. 98.7% of those accounts were also determined to be fake/bot accounts by the network graph. In other words, Twitter applies the “suspected spam account” identifier only once they have a high confidence level.

TwitterAudit identifies 3,903 fake accounts which followed during the attack. 98.8% of those accounts were also determined to be fake/bot accounts by the network graph. Similar to Twitter, when TwitterAudit identifies an account as spam/fake, there’s a high likelihood that it is, in fact, a fake/bot account.

Both Twitter and Twitter Audit fail to identify ~1.5K fake/bot accounts vs. the network graph.

When Twitter flagged an account as spam, there was a 76.6% chance the account was also identified as a fake/bot account by TwitterAudit. When TwitterAudit determined an account was fake, there was a 78.8% chance the account was also identified as a spam/fake account by Twitter themselves.

There were 3,049 accounts where all three approaches determined the account to be a fake/bot account.

Here’s another way to visualize the data set (accounts suspended by Twitter are yellow; accounts removed by Twitter are blue):

Making Sense Of It All

Earlier I made the following statement: Twitter’s entire platform is propped up by misleading/inflated follower/following counts

The presence of fake/bot accounts shouldn’t be the key takeaway from this post. What’s noteworthy is that Twitter is actually pretty good at identifying spam accounts, they simply choose only to scrub a fraction of these fake/bot accounts.

Why not remove them all?

Twitter is a publicly traded company. Every quarter, among other things, Twitter reports their MAUs (monthly active users). It’s a key metric that (potential) shareholders evaluate when making investment decisions. During Twitter’s 2017 Q3 earnings call, it was revealed that Twitter had been overstating their MAU count for the past three years.

In the case of my Twitter account, Twitter currently reflects having 13.5K followers:

@geoffgolberg’s follower count as of Mar 28th, 2018

This number, however, includes ~4K accounts that Twitter themselves have identified as “suspected spam accounts.” From the perspective of a Twitter user, I would feel much more comfortable using the platform knowing that followers/following counts being presented are a more accurate representation of reality. Moreover, when interacting with other accounts, it’s possible for users to include these counts as signals they process when evaluating the credibility of the accounts with which they engage.

From the perspective of an advertiser, having followers/following counts reflect accounts which are actually being used by humans (i.e. less Twitter’s “suspected spam accounts”) is critically important. Most importantly, is Twitter filtering out engagements/actions that involve “suspected spam accounts” when determining which events/actions are billable to advertisers? If advertisers are not being billed in those cases, why are those accounts being reflected in followers/following counts?

OK, So What’s Your Point?

The implications of Twitter’s decision to remove only a fraction of fake/bot accounts are far wider reaching than a single user (myself, in this case) being annoyed.

Earlier this month while reviewing accounts which were following mine, I simply sorted my followers by the number of tweets each account had posted. After noticing that one of my followers had an alarmingly large number of tweets, I decided to do a bit more sleuthing, later sharing my findings in this thread:

Be sure to read the full thread!

The next day I noticed that Twitter had suspended the @nine_oh Twitter account. Twitter would have (likely) continued counting the account as a monthly active user had it not been brought to their attention, and that’s the most troubling part. It’s worth noting that the account was flagged by Twitter themselves as a “suspected spam account” prior to being suspended.

In this particular case, an account which reflected having more than a million followers, was being utilized to amplify Trump/conservative tweets. Presumably Twitter’s algorithm views retweets from accounts with large numbers of followers as a favorable signal, whether or not Twitter has determined many of those followers to be “suspected spam accounts.” Often what happens is networks of accounts will retweet the same tweet/tweets in a short period of time. This is likely done as an attempt to game Twitter’s algorithm, giving the tweet/tweets more visibility in users’ timelines. In other words, it’s a coordinated effort to impact the flow of information across Twitter’s platform:

The issue isn’t specific to Trump/conservative tweets. It happens across the political spectrum and spans many countries.

It boils down to this: Twitter has made a decision to put profitability ahead of democracy.

Accountability

Earlier this month, Twitter issued an RFP (request for proposal) seeking direction from the public to help them “define what health means for Twitter and how [they] should approach measuring it.”

Twitter’s health would be tremendously improved if Twitter would do one (seemingly simple) thing: remove 100% of the accounts they have identified as “suspected spam accounts”

How can we as users, advertisers, and shareholders ensure that Twitter holds themselves “publicly accountable” to do so? How can we ensure that Twitter takes a more proactive approach to policing their ecosystem moving forward?