Scale Model, one of the newest companies to launch out of betaworks, helps identify, follow and reach communities on Twitter. While there’s a great visual dashboard that gives us a way to look at what’s bubbling up from within communities, it is still hard to evaluate which items appear on a regular basis, and which are more unique. For example, in the US politics model, the #WakeUpAmerica hashtag is used on a regular basis by conservatives, hence appears on the dashboard quite often. Wouldn’t it be great to know when activity around a certain hashtag is unique? Or more specifically, deviates from the expected behavior? Since we can’t expect users to be continuously glued to our dashboard, it’d be great if we could send out notifications whenever something important happens.
In the following post, we detail work done by Rohit Jain, a Master’s student at Cornell Tech, who spent the summer with the betaworks data team. Rohit’s work lays the groundwork for a number of new features we hope to integrate into Scale Model.
Anomaly Detection: seasonal hybrid ESD approach
This problem is common to a variety of areas like DevOps, fault detection, and automatic monitoring. From existing research around Twitter trend analysis, here are some interesting methods we found:
The parametric approach in #1 requires labeled data and the probabilistic approach proposed in
#2 doesn’t seem to work nicely with periodic data. The Seasonal Hybrid ESD approach in #3 — described in detail in Vallis et al (2014) — had been primarily proposed for long term anomaly detection, but seemed like a great candidate for the cyclical time series data we have. Below, we briefly describe this approach.
As a first step, we plotted the frequency of tweets containing a certain hashtag every 6 minutes for the community discussing politics in the United States. We observed the following patterns:
It is possible to see three distinct types of trends from the US Politics model.
- Some hashtags like #tcot exhibit a daily periodic pattern where the number of tweets containing that hashtag remains steady throughout the day, then falls dramatically during the night.
- Hashtags like #LoveWins were triggered by the historic Supreme Court ruling in favor of marriage equality, generating a big, sudden spike.
- #CruzCrew has a weekly cycle, where hashtag usage spikes on a certain day on a weekly basis.
We’re certainly interested in identifying significant trends such as #lovewins as early as possible, but we also want to know when periodic hashtags deviate from the expected behavior. Our assumption is that whenever the number of tweets containing a certain hashtag deviate from expected behaviour, there is an event responsible for it.
We begin by decomposing each hashtag’s time series of frequencies using STL, a decomposition procedure introduced in Cleveland et al (1990). This process yields trend, seasonal, and residual components for each time series, such as those displayed below.
With these components in hand, we replace the trend component for each observed frequency cycle with the median for that cycle. We then remove the seasonal and trend components from the time series and apply the generalized extreme studentized deviate (ESD) anomaly detection technique introduced in Rosner (1983) to the residual component. This test yields the anomalies in the hashtag’s frequency time series, which we then threshold based on statistical significance of the anomalies.
STL: A Seasonal-Trend Decomposition Procedure Based on Loess [Journal of Official Statistics, Vol. 6, No.1, 1990]
We applied this approach to our data and added a number of heuristics to cluster anomalies based on the time of occurrence and similarity of tweets.In the graph below, red dots represent all the anomalies detected and the yellow dots represent the starting point for clustered anomalies. The text in the box is the text of the top tweet for the anomalous period.
Ranking the identified anomalies, and adding a representative label for each (in this case – a representative Tweet)
Finally, we added a ranking algorithm to score these anomalies based on the element of surprise in order to identify the anomalies that should trigger user notification. Though the results look promising we need a better way to evaluate and measure this (or any other) method’s performance. To do so we need labeled data that is not readily available (e.g. “what is a good abnormal event in the Animal Lovers model…”).
Our solution? We created a Slack bot that not only notifies us whenever an anomaly is detected, but also urges users to vote whether the anomaly corresponds with an outlying trend on Twitter.
Just to give a few examples, these are the notifications we received from US politics community during the recent Republican debate on Fox News:
Slack Integration: gossip-chimp posts hashtags as soon as they’re identified as a trend, along with metadata and links to train our system (upvote, downvote)
For our (beloved) animal lovers community, we received the following notification about #WorldElephantDay at around 8 am EST.
Our system identified #WorldElephantDay as a trend for Scale Model’s Animal Lovers community early on, distinctly before the trend peaked.
We also used this technique to analyze data for a longer period of time, with the aim of generating a monthly report that summarizes important events for several Scale Model communities. Below you can see the top 5 surprising hashtags and a related tweet for each in the US Politics and animal lovers communities for the duration of 23 June – 15 Jul 2015.
Top Trends for Pet Lovers Model
Top Trends for US Politics Model
The technique has yielded promising results so far, identifying relevant trends within several key communities in a timely manner. We also “push” these trends to the Scale Model team via Slack, so that event detection no longer requires frequent community monitoring. Further work will introduce smarter spam detection and evaluate alternative techniques using data we’ve collected so far. Ultimately, we would like to roll this feature out to all Scale Model communities and help our customers stay abreast of #NewTrends in communities in real time.
Rohit & Alex
A few days ago I published an in depth analysis of Apple’s iTunes top free chart algorithm, boosting, rank manipulation and algorithmic glitching – on medium.com.
Here’s the overview:
On October 29th and December 18th, 2014, something very strange happened to the iTunes top apps chart. Like an earthquake shaking up the region, all app positions in the chart were massively rearranged, some booted off completely. These two extremely volatile days displayed rank changes that are orders of magnitude higher than the norm — lots of apps moving around, lots of uncertainly.
If you build apps for iOS devices, you know that the success of your app is contingent on chart placement. If you use apps on iPhones and iPads, you should realize just how difficult it is for app developers to get you to download their app. Apple deploys an algorithm that identifies the Top Apps across various categories within its iTunes app store. This is effectively a black box. We don’t know exactly how it works, yet many have come to the conclusion that the dominant factor affecting chart placement is the number of downloads within a short period of time.
If a bunch of people all of a sudden download your app, you climb up the charts, and as a results, gain significant visibility, which results in many more downloads. Some estimate that topping the charts may lead to tens of thousands of downloads per day.
Encoded within the iTunes app store algorithm is the power to make or break an app. If you get on its good side, you do really well, and if not, you lose.
If these volatile days are deliberate, shouldn’t we be informed? There are over 9 million registered developers who have shipped 1.2 million apps into iTunes. Algorithmic glitches on wall street can set off hundreds of millions of dollars in losses. What’s the dollar cost to entrepreneurs affected by these iTunes glitches? These are people who pour countless hours and resources into adding value to Apple’s ecosystem. Whether running experiments or A/B tests, shouldn’t Apple show due respect by taking issues like this seriously?
While the app store’s ranking algorithm is opaque, there’s much to be learned by looking at it’s output over time. In his work on Algorithmic Accountability, Nick Diakopoulos highlights ways to investigate the inner-workings of algorithmic systems by tracking inputs and outputs.
Analyzing this type of data gives us a way to hold accountable systems of power, in this case, Apple and its algorithm.
Perhaps Apple is not aware of these glitches? Or maybe my data is flawed? I’ll let you be the judge of that. I did manage to find another person complaining about abnormal chart rank fluctuation around the same time. If you’ve witnessed something similar, please add a note or get in touch.
Read the full piece here.
Last week the Digg editors came to us with an idea. Wouldn’t it be great if we could do a joint data post on March Madness, but instead of the usual – looking at the account with the most likes, or retweets – we would take a very different approach. Their idea was to find the most disliked team on social media, and map that out.
We began by grabbing Twitter data using the public APIs, looking for tweets that have certain keywords, hashtags and twitter handles connected to March Madness events and games. Our dataset grew rapidly, hundreds of thousands of tweets from different games, referencing teams, players and many other random facts. But surprisingly, we found very little hate. Using a number of sentiment analysis libraries, and services. Even when trying to poll the data for certain common words or phrases to put together a training set to build our own classifier yielded poor results. We just couldn’t find sufficient examples in the data.
Weirdly enough, we came across numerous users who tweeted about their “love/hate” relationship with March Madness. But not much anger or hate, and very little directed at specific teams. Perhaps a byproduct of being a part of a networked public, one’s actions are not only seen by their peers, but visible to the network at large, hence one might be more cautious before displaying strong sentiment.
Constructing their Fan-base
Instead we chose to focus on each team’s networked audience (“fan-base”). If we can identify users rooting for each team, we’d be able to pull out some potentially interesting facts about each cluster of university supporters and run some broader comparisons. For the final four teams, we chose a number of popularly followed Twitter handles used to represent each team. Then we collected all users who retweeted those accounts into sets, making the *very crude* assumption that a retweet mostly represents an endorsement. While certainly not true in political discourse, especially when trying to make content visible to one’s audience for critique, this was rare in our case. And even if it did happen, the event would pretty much be an outlier in our data (users not part of the connected component).
Finally for every user in one of the four sets, we grabbed information about Twitter accounts that they follow. The Twitter graph represents both friendships as well as interest. By using network analysis techniques, we were able to run comparisons across the different teams.
For the digg post, the editors pulled out the following data points: devices used, user bios, and mainstream media preference (Fox News, CNN, MSNBC) reflected by who they follow. Some excerpts:
“Florida is uh, well, the only place where Fox News is popular. And it’s not even that popular. In fact, Florida is the only school where Fox News even shows up in the data. Obviously, this is some sort of conspiracy related to Benghazi.”
“Aaron Rodgers is more popular than sex. Well, actually we don’t know how popular sex is, but 50.25% of Wisconsin fans follow Aaron Rodgers and he didn’t even go to Wisconsin.”
Analyzing the Networks
At the end of this process we were left with some very interesting graphs representing the different fan-bases. For example, if we take all the Kentucky Wildcats fans, and insert each Twitter user as a node, and connect them by who they follow on Twitter, a fairly dense cluster emerges. The larger a node, the higher its degree in the network – the more connected it is to other nodes. The average degree for this graph is 11.731. This means that on average, every user is following or is followed by a total of ~12 other users from this same network. The higher the number, the more densely connected the community. BTW – the median degree for this graph is 5, still quite high.
Wisconsin’s fan-base looks quite different. Generally the number of retweets for Wisconsin team accounts is significantly higher compared to the other three teams. The size of the Wisconsin user set is almost three times as large as that of Kentucky’s. While the graph is more dense, there are fewer central nodes, and many more tiny specs representing less connected users. Still, the average degree for nodes in this graph is 16.89, and the median, 8. Both higher than the Kentucky fan graph. So even though the Wisconsin fan-base is much larger in our sample, it is also much more inter-connected.
If we double down on this WI graph, and run a network detection algorithm, we can start to identify regional friend networks and interest groups, outlined by similarly colored areas. For each of these clusters, we can see trends – users in the bottom purple region tend to have “Green Bay, WI” in their bios, while those in the Red region have either La Crosse or Galesville, WI. Many of them are high school or college students, on various sports teams, connected to their friends in the network. I was surprised to see many of them using Vine. Curious if that’s a general trend across student populations, or simply a regional one.
This is a pretty standard analysis, and only touching the surface in terms of audience insight that can be reached. We’ll be highlighting much more of this type of work here, and get deeper into methods and techniques.
Much has been written about the meteoric rise and abrupt demise of Flappy Bird, the highly addictive mobile game that seems to have captured the world over the past couple of weeks. Dong Nguyen, an independent game developer in Vietnam, who launched the game obscurely last May, decided to take it down from all app stores after achieving heights previously touched only by major franchises like Candy Crush and Angry Birds, ending the frenzy around the frustratingly difficult game, while adding to the already heightened media spectacle.
What is it about Flappy Bird that made it so successful, and why did it take so long for the game to go “viral“? The app stores are littered with thousands of free casual games that use similarly addictive gameplay. What can we learn about the rise in uptake of this game specifically? And can we perhaps identify a tipping point where engagement around the game crossed a certain threshold, gaining momentum that was impossible to stop?
These were some of the questions that I set to answer earlier this week. At betaworks, we have a unique longitudinal view of mainstream media and social media streams. Our services, at varied scales, span across content from publishers and social networks, giving us the ability to analyze the attention given to events over time. Inspired by Zach Will’s analysis of the Flappy Bird phenomenon through scraped iTunes review data, I wanted to see what else we could learn about the massive adoption of this game, specifically through the lens of digital and social media streams.
My data shows two clear tipping points, where there was significant rise in user adoption of the game. The first, January 22nd, happened when the phrase ‘Flappy Bird’ started trending on Twitter across all major cities in the United States. It would continue trending for the next 6 days, driving increased visibility and further adoption. The second, on February 2nd, was the point in which media coverage of the game quadrupled on a daily basis. Most media outlets were clearly late in the game, covering the phenomenon only after it had topped all app store charts and was already a massive success.
We live in a networked world, where social streams drive massive attention spikes flocking from one piece of content to another. While it is a chaotic system, difficult to fully predict or control, there are early warning signals we can heed, as events unfold.
This plot shows aggregate unique trending topics locations (in red) , versus unique media sources covering the game on a daily basis (in purple) versus the number of unique users sharing links related to game in social streams (in green).
I made a tappable version of my DataGotham talk from Sept 9th, 2013. It goes through a number of ways in which we use graph analysis to find insight across different datasets.
Hope you enjoy!