Chirpsy's Inner Workings
Two years ago we set out to build a startup that would tweet for you when you don’t have the time. Last year we released it and we have been maintaining it for well over a year now. This service is called Chirpsy and the simple pitch is that you pay us a monthly subscription and we give you tweets on a daily basis. In this article I am going to go through how we form these tweets since it isn’t a super straight forward process. The important thing to note is it is a mix of humans and automated processes that Chirpsy uses for forming these tweets.
Start: The Campaign Configuration Page
This is what Chirpsy collects from its customers. We need the customer website, a blurb of text, and then a series of keywords that describe core topics for that customer. Consider this the starting point of the process. After that the following process gets kicked off.
Step 1 & 2: Go find articles
The first thing Chirpsy has to do is to find a bunch of articles related to the keywords the customer put it. To do this we use a series of Agents that go out and look in different places. What we built was an Agent architecture where we can build news finding agents that bring us articles based on keywords. We experimented with this trying Bing, Google, and Twitter and found that Google was the best for now. Google has options that let us setup RSS feeds, pull from blogs or news.
Step 3: Send the articles to Mechanical Turk
Now that we have the articles queued up, we send them off to Amazon's mTurk service to be processed. The actual mechanism used here is SQS. It is important to note that budgeting to get enough articles is always a concern. Sometimes we get too many articles in our backlog. So we can only take the top ones. These are sent off and if we get enough good tweets for the customer then we stop until the next day for that account. But, if we don’t get back enough articles then we have to send up another batch.
Step 4, 5: Judge and Write
Nothing in mTurk is what it seems. The most important lesson you will learn is that you are working with people and these people are motivated by making money on these tasks. If you are not careful you will get garbage back from the workers and they will take your money. This really is a black art of working with process flows and psychology. What we found is to break down this task so that workers decide first if the article is relevant, then they write the tweet. They get paid a small amount for deciding if the article is acceptable and a larger amount to read the article and write the tweet. We also batch up the activities into groups of 10 because it helps a worker get into the zone and do several of the same tasks at once. The screen they see is essentially this.
Step 6: Grade
Since workers can be very tricksy we need to take precautions. So we have other workers grade that work. They tell us if the work was done well or poorly. Based on this we may do many things. For instance, we might not even accept the work. If they did a good job they might get a small bonus. This step is key if you want quality. Realize that we don’t just grade this once but, it could be several times. We don’t know if the grader is telling the truth either so we have to design the system in such a way that it isn’t worth gaming.
Step 7: Back to Chirpsy
At this point we have the tweets so we put them back into SQS and send them over to Chirpsy. Chirpsy picks them up and puts them in the database. A customer would then see the tweets in a feed like this.
Step 8: Tweeting
Now that the tweets are ready, the customer can queue them up and Chirpsy will tweet them periodically. Additionally we can send them to HootSuite if the user wants to do something more complex like send them to facebook or any other social media that HootSuite integrates to.
After running the service for nearly a year we have had to innovate the service and business. The following problems and solutions give you a bit of insight into what we have learned
Problem 1. Constant flow of articles.
We always need more articles for the workers. Sometimes a customers keywords are just too esoteric for us to be able to find anything. So we have to either ask them to add more keywords or we can’t fill that order. There just are only so many articles written on knitting on a daily basis.
Problem 2. Too many articles
In some cases the keywords give us a glut of articles. We basically put them in a queue and then pull from that queue as we need more to fill the users account requirements. But sometimes, we end up with so many that all we can do is send the most recent articles to the workers. The workers are paid to verify if the article is valid so in the end we just end up paying an awful lot to workers to look for good articles.
Problem 3. Articles get old inconsistently
We have a concept of “Shelf Life” for articles. So the older an article is then the less we want to use it. Eventually we throw it out if it expired. This isn’t true for all articles though. Different type of information goes bad at different rates.
Solution 1. Pay attention to keywords
The most obvious solution to these problems is that we have to pay attention to the keywords and the articles a customer is getting. For that reason we have built a series of tools that let us see how efficiently articles are turned into tweets. If we see that we are only having 6% of the articles turned into tweets then we know we are pouring money into filtering.
Solution 2. Talk to customers
If a customer is having trouble with keywords then the only way to fix that is to talk to the customer and try to help them tune their keywords. Sometimes it is simply they didn’t understand that they could add more than one set of keywords. Sometimes they just didn’t provide very good instructions on what they were looking for so the workers could fulfil their request.
Chirspy can provide you with a never ending set of unique tweets. It does this by using a complex system of man and machine. The machine finds the articles and real people at mTurk write the tweets. This is great for filling in that dead space between important announcements so your twitter feed looks active. One way to use this is for giving you a break from feeling like you must constantly send something out to your audience. The same way a dog sitter would watch your dog for you while you are Hawaii. And while you are away you know all of these tweets are unique thanks to the workers at Amazon’s Mechanical Turk!