Category Archives: Gargl

Cracking Candy Crush

Update: King, the maker of Candy Crush, has request Candy Crush Cracker be removed from the Chrome Extension store. Due to this, it can no longer be installed directly via the Chrome Extension store, and must be downloaded and installed from its source instead. Links below have been updated to point at the new location, which contains the source and installation instructions.

After receiving a lot of interest in Trivia Cracker, a Chrome extension that lets you easily cheat in the popular game Trivia Crack, I decided it might be interesting to see if the same kinds of vulnerabilities existed in other popular games. Given its insane popularity, the first game I thought to investigate, of course, was Candy Crush.

For those of you living under a rock, Candy Crush Saga is a match-three puzzle game for Facebook, iPhone, and Android, released back in 2012. Even though it is essentially a re-skinned Bejeweled, Candy Crush has managed to ride the “most popular” app store charts unlike any game before it. Even now, three years after its release, it’s still going strong as a top app in both the iOS and Google Play app stores. And that’s not to mention the insane 75 million likes Candy Crush has racked up on Facebook.

Given its popularity, you’d think the developers of such a polished and successful game might have taken the time to implement it in a way that is secure from cheating. But, as it turns out, writing some code to cheat at Candy Crush is actually fairly simple. Just like with Trivia Crack, over the course of a weekend I was able to write and release a Chrome extension, Candy Crush Cracker, that converted me from a medicore-at-best Candy Crush player to a god-like crusher of candy. You can see Candy Crush Cracker in action below, where I use it to get extra lives and to beat levels with any score I want:

So what’s wrong with Candy Crush Saga’s implementation that allowed me to so easily build a tool that lets anyone cheat? In short – beating a level in Candy Crush is as easy as sending a request to the Candy Crush server, saying you beat the level. You can even send along a score – any score – to say you beat the level with that score. The details of the vulnerability, how I found it, and how I built a Chrome extension to take advantage of it are below.

 

1 – Finding the vulnerability

Many of my friends are Candy Crush fanatics, achieving scores and reaching levels I never would be able to naturally. But while my Candy-Crush-playing abilities have continually failed me, I figured maybe my reverse-engineering skills could take me to new candy-crushing heights. I suspected it might be possible to send my own requests to Candy Crush’s servers, or use some data in the responses sent to the client from Candy Crush’s servers, to gain an edge in the game. So, I started researching what kinds of data the Candy Crush client and server pass back and forth.

To inspect this data, I followed much the same process as with Trivia Crack. I played Candy Crush in my browser on Facebook, while recording and inspecting the requests and responses sent between Candy Crush’s client and server, using a tool I’d created previously called Gargl. Yes, I know I could have used Fiddler or Charles or Chrome’s Developer Tools to do the same. I decided to use Gargl instead because in addition to letting you view client/server requests/responses, Gargl also lets you modify and parameterize these requests, and then auto-generates modules in a programming language of your choice so you can make these same requests without writing a line of code. But more on that later.

Anyway, after telling Gargl to start recording and going to Candy Crush on Facebook in my browser, the first step was to figure out which of the many requests being sent on this Facebook page were related to Candy Crush, versus Facebook itself. Inspecting the HTML on the page showed that the Candy Crush flash content is embedded into Facebook via an iframe. The element right above this iframe was a form meant to post to a peculiar URL – https://candycrush.king.com/FacebookServlet/.

Using Chrome Dev Tools to inspect Candy Crush Cracker's HTML

I knew King is the company that creates Candy Crush Saga, so I suspected this is the domain where Candy Crush is hosted. The next step was just to start playing Candy Crush, and as I played to look at the requests Gargl finds that the page is making to any URL containing “king.com”:

Using Gargl to record actions in Candy Crush

As I beat levels in Candy Crush, I noticed a new request seemed to be issued for each level. The requests seemed to be issued right after I successfully completed a level:

Using Gargl to record actions in Candy Crush

So, it seemed, maybe the client tells the Candy Crush server when a game is over. This made me think maybe the client doesn’t just say the game is over, but also says whether the user beat the level or not, and if the level was beaten, with what score the user beat the level. I figured I had a lead, and dug into the details of this “gameEnd” request.

 

2 – The vulnerability in detail

Using Gargl to look at the “candycrush.king.com/api/gameEnd3″ request/response in detail, I was able to confirm that it does indeed tell the server when the game is over, and the score with which the user beat the level:

Using Gargl to record actions in Candy Crush - the vulnerability

As you can see above, the request sent to the server contains, as a query string parameter, a JSON object containing the score the level was beaten with, the id of the level that was beaten, as well as a bunch of other information.  The query string parameter’s name is a not-very-descriptive “arg0″ — maybe an attempt by the game’s creators to try to hide the fact that this parameter is the secret to making all your Candy Crush dreams come true!

The full value of the “arg0″parameter value looks like the below:

{
  "episodeId": 1,
  "timeLeftPercent": -1,
  "movesLeft": 0,
  "cs": "555b89",
  "seed": 1428911271508,
  "variant": 0,
  "score": 39800,
  "movesInit": 15,
  "reason": 0,
  "movesMade": 15,
  "levelId": 2
}

From some experimentation and watching this request as I finished multiple levels, I was able to discern what most of the fields in arg0 mean, and where they come from. EpisodeId and levelId are used to identify the level, and can be found in the request sent to the server when you start playing a level — https://candycrush.king.com/api/gameStart2. Seed can also be found in this “gameStart” request, and seems to represent a random seed for how the layout of the candy in the level should look. In addition, every API request made to Candy Crush must be sent with an “_session” query string parameter, to identify the current user session. This value can also be found in the gameStart request, and really in any request to Candy Crush for that matter.

Here’s what the https://candycrush.king.com/api/gameStart2 request looks like:

The Candy Crush Saga gameStart2 API request

Again, it looks like Candy Crush’s creators are either really bad at coming up with creative parameter names, or they’re trying to obfuscate this information to make it harder to manipulate their API. EpisodeId is sent via a query string parameter called “arg0,” levelId is sent as “arg1,” and seed is sent as “arg2″. For some reason, they did decide to use a fairly descriptive name for the session token though — “_session.”

Other than episodeId, levelId, score, and seed, the rest of the fields in the gameEnd request’s arg0 query string parameter are unimportant, and can be hard-coded like above. That is, except for cs. Cs in this case probably stands for checksum, because if you do not send the right value for it, the request will fail. It turns out constructing the value of the checksum field is not all that difficult either. To get the correct checksum, simply MD5 hash a specific string and use the first 6 characters of that string as the checksum. The string to hash matches the format:

<episodeId>:<levelId>:<score>:-1:<userId>:<seed>:BuFu6gBFv79BH9hk

UserId is the only piece of information we don’t already have that is needed to construct the above string. It is sent in the “gameInit” request that happens every time you load Candy Crush Saga – https://candycrush.king.com/api/gameInitLight. You can make this request at any time (passing _session as a query string parameter, of course) and the response will contain your userId:

{
  "currentUser": {
    "userId": 1754400119,
    "lives": 5,
    "timeToNextRegeneration": -1,
    "gold": 0,
    "unlockedBoosters": [
      
    ],
    "soundFx": false,
    "soundMusic": false,
    "maxLives": 5,
    "immortal": false,
    "mobileConnected": false,
    "currency": "USD",
    "altCurrency": "USD",
    "preAuth": false
  },
  ...
}

Great, we now have everything we need to make the gameEnd request! Let’s try popping this information into Fiddler’s composer, targeting the first level of the game, and see what happens when we enter a score of 100,000, calculate the checksum, make the gameEnd request, and then reload Candy Crush Saga:

Getting a high score in Candy Crush Saga

Well, my friends, it appears we’ve successfully cracked Candy Crush!

While Candy Crush Saga did take some defensive measures, allowing a single request to complete the level, with any score, is in direct conflict with the “Defensive Programming” practice of programming – particularly the “never trust the client” web programming principle. Since the server has no control over how the client acts, it can’t assume the client will not act in a malicious way, and so must protect itself. A better way of implementing “completing levels” would be to make the client send every move the user makes in the level to the server, and having the server determine if those moves successfully earn a score high enough to complete the level. While this method also isn’t perfect, it at least means the client, whether through manual user action or via some automated method, has to play the level instead of just telling the server “I win.”

However, Candy Crush did not do this, and instead trusts the client. Now it was just a matter of creating a malicious client to take advantage of the fact that the client can just tell the server it won any arbitrary level. Ideally, one that would be easy for non-technical users to install and use. Hmm…how about a Chrome extension that just adds a button to the Candy Crush game, when played on Facebook, that when clicked beats the current level automatically??

 

3 – Taking advantage of the vulnerability

As I mentioned above, Gargl allows you to take the requests you had it record, modify and parameterize them as needed, and then auto-generate modules in a programming language of your choice to make these same requests. I’m not going to go into the details of that process since you can look at one of my Gargl blog posts to find that info, but essentially I generated a Gargl template file for Candy Crush‘s various API requests, using the Gargl Chrome extension, and then used a Gargl generator to turn that template file into a Candy Crush JavaScript library. Using Gargl for this allowed me to create a JavaScript library that talks to Candy Crush’s servers, without writing a line of code to do so, and also to have a template file around for the future in case I want to do the same thing later with another programming language.

Once I had this Candy Crush JavaScript library, it was a simple matter of building a Chrome extension in JavaScript that runs on the domain loaded in the Candy Crush Facebook game page’s iframe (candycrush.king.com), adds a button to the HTML for the game, and when that button is clicked asks the user for a score, does the above steps to find episodeId, levelId, seed, _session, and userId, and then issues the gameEnd request to beat the current level.

Candy Crush Cracker in action

And just like that, Candy Crush Cracker was born! Curious about the exact details of how Candy Crush Cracker works? Check out the source code on GitHub.

Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.

Cracking Trivia Crack

Update 2: After writing this post, I also discovered another vulnerability in Trivia Crack that allowed for giving users unlimited lives/spins. At that time, I added this as another feature to Trivia Cracker. Yesterday (1/29/2016), Trivia Cracker started preventing this feature (automatic question answering continues to work — as described in this post). But what’s funny is, in addition to preventing it, the Trivia Crack server also calls me out specifically!

Trivia Crack error

 

Update 1: Using similar techniques to the below, I’ve also released a cheating Chrome extension for Candy Crush. Get it here.

 

As the holiday season began a few weeks ago, I started to notice that a new app, Trivia Crack, was becoming popular with my friends.  For those of you who don’t know, Trivia Crack is a new game for Android, iPhone, and Facebook. The premise of the game is simple, and very similar to another popular app from last year, QuizUp. Essentially, you answer trivia questions of various categories, competing against your friends for bragging rights. Despite the simplicity (or maybe due to it), Trivia Crack has become very popular as of late – racking up 5.3 million likes on Facebook and becoming the top free app in the Google Play (Android) store.

You’d think the developers of such a polished and successful game might have taken the time to implement it in a way that is secure from cheating, but it turns out writing a program to cheat at Trivia Crack is actually fairly simple. Over the course of a weekend, I was able to write and release a Chrome extension, Trivia Cracker, that turned me from a medicore-at-best Trivia Crack player to a seemingly genius demigod. You can see Trivia Cracker in action below:

So what’s wrong with Trivia Crack’s implementation that allowed me to so easily build a tool that lets anyone cheat? In short – when the Trivia Crack client requests from the Trivia Crack server the next question to ask the user, the server responds not just with the question and the possible answers, but also sends which answer is the correct answer. The details of the vulnerability, how I found it, and how I built a Chrome extension to take advantage of it are below.

 

1 – Finding the vulnerability

After losing to my friends in Trivia Crack one too many times, I decided I wanted to see if I could win in my own, special way. While I’m not so good at random trivia, I am pretty good at reverse-engineering. I suspected I might be able to take advantage of sending my own requests to Trivia Crack’s servers, or using some data in the responses from Trivia Crack’s servers, to gain an edge in the game. So I started by researching what kinds of data the Trivia Crack client and server pass back and forth.

To inspect this data, I played Trivia Crack in my browser on Facebook, while recording and inspecting the requests and responses sent between Trivia Crack’s client and server using a tool I’d created previously called Gargl. Yes, I know I could have used Fiddler or Chrome’s Developer Tools to do the same. I decided to use Gargl instead because in addition to letting you view client/server requests/responses, Gargl also lets you modify and parameterize these requests, and then auto-generates modules in a programming language of your choice so you can make these same requests without writing a line of code. More on that later.

Anyway, after telling Gargl to start recording and going to Trivia Crack on Facebook in my browser, the first step was to figure out which of the many requests being sent on this Facebook page were related to Trivia Crack, versus Facebook itself. Inspecting the HTML on the page showed that the Trivia Crack content is embedded into Facebook via an iframe. The element right above this iframe was a form meant to post to a peculiar URL- https://preguntados.com/game.

Using Chrome Dev Tools to inspect Trivia Crack's HTML

My limited knowledge of Spanish reminded me that “pregunta” means “question,” which seems related to trivia, so I suspected this was the domain where Trivia Crack is hosted. Going to the https://preguntados.com/game URL confirmed that suspicion:

Playing Trivia Crack on Preguntados.com

The next step was just to start playing Trivia Crack, and as I played look at the requests Gargl finds that the page is making to any url containing “preguntados.com”:

Using Gargl to record actions in Trivia Crack

As I answered Trivia Crack questions, I noticed a new request seemed to be issued for each question. Actually, the request seemed to be issued before I even “spun the spinner” in Trivia Crack to determine what category the next question would be:

Using Gargl to record actions in Trivia Crack

This made me think maybe the questions were predetermined, and the “random spinner” in fact was predetermined to land on a certain category (and ask a certain question) based on the response from the server to the “api.preguntados.com/api/users/<userID>/games/<gameID>” request that sent before you click the “Spin” button. To me, this meant I could probably alert the user to the question they are about to be asked ahead of time, so they would have as much time as they wanted to think about it or look it up. This would be an advantage because in Trivia Crack, you only get 30 seconds to answer a question once the question is shown, to prevent the user from looking up answers. In addition, in some game modes its not just the number of questions you get correct, but also the amount of time you take to answer, that determines if you win. If I could show the user the question ahead of time, they could obviously answer it much more quickly once Trivia Crack itself shows the question.

So I figured I had a lead and dug into the details of this request. It turns out this request actually provided a much larger “cheater’s advantage” than I thought…

 

2 – The vulnerability in detail

Using Gargl to look at the “api.preguntados.com/api/users/<userID>/games/<gameID>” request/response in detail, I was able to confirm that it does indeed provide the question to be asked, as well as the possible answers, in the response. However, I was surprised to find that the correct answer to the question is also embedded in the response!

Using Gargl to record actions in Trivia Crack - the response vulnerability

As you can see above, the response to this request contains the question, the possible answers, and which index in the array of possible answers is the correct answer. So if this data is indeed about the next question, after I click “Spin,” I should be asked “Which of the following African countries doesn’t have a coast?” and the correct answer should be answers[3] from above, which is “All of them do”. Clicking “Spin” proved this hypothesis correct:

Playing Trivia Crack, before selecting an answer

Playing Trivia Crack, selecting the correct answer

I also used my browser to find that repeated GET requests to “api.preguntados.com/api/users/<userID>/games/<gameID>” always give the same response, until you answer the question it provides. This means I could request this URL from my own tool and still receive the same question as the Trivia Crack client on Facebook would receive.

A response containing the correct answer, when the goal is that the user must determine the answer themselves, is in direct conflict with the “Defensive Programming” practice of programming – particularly the “never trust the client” web programming principle. Since the server has no control over how the client acts, it can’t assume the client will not act in a malicious way, and so must protect itself. The correct way of implementing answer checking behavior is to do it on the server side. Don’t send the correct answer to the client at any point, and instead make the client send the user’s answer to the server, where the server will then check it for correctness and credit the user if correct (and of course, only allow one answer to be sent per user per question).

However, Trivia Crack did not do this, and instead trusts the client. Now it was just a matter of creating a malicious client to take advantage of the fact that the correct answer is sent in the response. Ideally, one that would be easy for non-technical users to install and use. Hmm…how about a Chrome extension that just adds a button to the Trivia Crack game, when played on Facebook, that when clicked answers the current question correctly automatically??

 

3 – Taking advantage of the vulnerability

As I mentioned above, Gargl allows you to take the requests you had it record, modify and parameterize them as needed, and then auto-generate modules in a programming language of your choice to make these same requests. I’m not going to go into the details of that process since you can look at one of my Gargl blog posts to find that info, but essentially I generated a Gargl template file for Trivia Crack‘s various API requests, using the Gargl Chrome extension, and then used a Gargl generator to turn that template file into a Trivia Crack JavaScript library. Using Gargl for this allowed me to create a JavaScript library that talks to Trivia Crack’s servers, without writing a line of code to do so, and also have a template file around for the future in case I want to do the same thing later for another programming language.

Once I had this Trivia Crack JavaScript library, it was a simple matter of building a Chrome extension in JavaScript that runs on the domain loaded in the Trivia Crack Facebook game page iframe (preguntados.com), adds a button to the HTML for the game, and when that button is clicked requests the “api.preguntados.com/api/users/<userID>/games/<gameID>” URL, parses the response for the correct answer, and then clicks the answer button on the page representing the correct answer.

Trivia Cracker running on Trivia Crack to answer questions for me

And just like that, Trivia Cracker was born! Curious about the exact details of how Trivia Cracker works? Check out the source code on GitHub.

Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.

Gargl vs Kimono

I’ve been getting a lot of questions about how Gargl differs from to Kimono. The truth is, they are pretty similar! While even I would agree that Kimono is more user friendly than Gargl, overall I think Gargl is better (granted I’m biased) for a number of reasons:
  • Gargl is free
  • Gargl is open source
  • Gargl is “component-tized” – you can write a Gargl recorder for another program like Fiddler or Wireshark, or another browser, and as long as it outputs valid .gtf files, it can be used with a generator (which again is component-tized; you can implement you own generator).
  • The APIs it creates and lets you edit can handle more than just GET requests, which is all Kimono supports. Ex: POST
  • As long as the generated modules keep track of cookies, you can use Gargl for API requests that require user authentication. Not supported with Kimono.
  • Kimono will only scrape at most every hour, and then the API you use gives you stale data. Gargl APIs are as live as you want them to be.
  • Kimono operates as a service, even if they “host” your API, you still need to write an API to integrate into their API for APIs! I’m sure they have this available as a client library for some languages, but not all. Gargl will output the module straight in any programming language it supports.
  • Kimono operates as a service, so if a site doesn’t want to be scraped by them anymore, it just has to ban Kimono’s IP address. Then your Kimono API would stop working. Gargl modules can be used server or client side. With Gargl, you could run into the same issue using a Gargl module on a server, but if you embed the module directly into a client app so that the module is talking to the website using the client device’s IP address, there’s no way for the site to stop access because requests are coming from each client IP, not from one server IP.
  • In my opinion, Kimono is in a dubious area legally. They scrape others’ websites, and expose that data over an API to any paying customer. If a customer does something the site being scraped didn’t want to happen, the site owners could sue Kimono (and very possibly win) for providing this information to its customer, who is doing something the site considers misappropriation of their data or software. See the beginning of my first post on Gargl, where I talk about the legal issues. If Kimono gets sued and shuts down, anyone who used their site to make APIs immediately loses those APIs. With Gargl, it is a decentralized, open source, tool. If someone uses Gargl to make a module and then uses that module to do something against a site that the site did not want to happen, they may get sued, but the tool to generate the module is not directly responsible unlike Kimono, which is actively scraping the website and even being paid by the “malicious” customer.
  • Using Kimono, any data you receive has to go through Kimono’s servers in order to get to the API caller. If the data is sensitive, you obviously don’t want it passing through a third party server that you have no control over. Gargl modules connect the API caller straight to the original website, so there’s no third party in the middle you have to trust.
I think the above reasons demonstrate some great benefits of Gargl that are worth sacrificing a bit of usability for. And of course, the usability of Gargl can be improved over time.
That’s my two cents, anyway.
Until next time,
Keep Calm and Gargl On
 Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.

Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough

Hopefully by now you’ve learned a bit about what Gargl is all about, and why you’d use it. If not, take a look at my other blog post on Gargl first, Introducing Gargl: Create an API for any website, without writing a line of code.

Now that you know what Gargl is, lets talk about how using the Gargl Recorder Chrome Extension lets you easily record websites and generate Gargl templates. You can get the extension in the Chrome Web Store or add it to Chrome yourself from its source code here.


Gargl in action – Create an API for Yahoo in 3 minutes flat

After installing the Gargl Chrome extension, you can start using it by going into Chrome and hitting F12 to bring up the Chrome developer tools. You should notice on the far right a Gargl tab.

gargl1

 

Click on the Gargl tab. You can see in the Gargl tab that you can load an existing template file, or start from scratch on a new one. Let’s use Yahoo’s autocomplete and search “functions” to demonstrate how the Gargl recorder works. Navigate in the browser to the Yahoo homepage.

gargl2

 

Click ‘Start from Scratch’ and enter the module name and module description for the template file we’ll be creating. Since we only want to record the requests and responses against Yahoo, enter “yahoo” in the filter requests textbox.

gargl3

Click “Start Recording” to tell Gargl to start monitoring requests. Now let’s use the search bar on Yahoo’s site like we normally do. Don’t actually search, just type in the search bar to generate autocomplete results. As we type and autocomplete results roll in, we can see a request appears in Gargl for every letter we type. This is because Yahoo is getting updated autocomplete results from the server every time we type another letter.

gargl4

 

Looking in Gargl, you should see a lot of “search.yahoo.com/sugg” requests. Each of these represents a request / response that Gargl logged when Yahoo’s search page asked the server for autocomplete results.

gargl5

 

Now hit the “Stop Recording” button. Clicking on the details button for any of these requests will show us the details for the request. As you can see, it shows us the URL, method, and query string parameters of the request. For post requests it will also show the post data, of course.

gargl6

 

Since all of these requests are the same function (autocomplete, just with different ‘term to complete’ data), hit remove on all but the last “search.yaho.com/sugg” entry. For that last entry, enter “Autocomplete” for the name of the function.

gargl7

 

Hit the Edit button for the autocomplete function. As you can see there is a lot of data about the request / response of this function that was recorded and you can adjust. Enter a description for the autocomplete function. You could also change the request url here if you wanted to parameterize it (more on that below), but since this URL doesn’t contain any parameters that we care about, let’s leave it as is.

gargl8

 

Similarly, you could adjust or parameterize the request headers sent to the server for this function if you wanted. We don’t need to do that for this function though.

gargl9

 

You can also change the query string parameters sent in the request (or post data if the request is a post). As you can see, the “command” parameter had a value of “hello kitty is” when the request was sent. This is because this is what I was typing in the search box, which Yahoo wanted autocomplete suggestions from the server for.

gargl10

 

We could change the value of “command” to some other static value, like”spaghetti and “, to make the function always request autocomplete suggestions for “spaghetti and “, but lets instead create a function that takes in a parameter which becomes the value of the command query string parameter, so that the invoker of this function can specify what term to get autocomplete suggestions for. Setting the value of a key in the request to a parameter rather than a static value is done through a process called “parameterization.”

Parameterization is very easy to do —  just replace the current static value for the key with the name of the parameter you want to be a parameter to the function, surrounded by “@” signs. As you can see below, I want the autocomplete function of my Yahoo module to take a parameter named “term”, whose value becomes the value of the “command” query string parameter of the request that is made.

gargl11

 

Now hit “Save” to return to the page showing all your functions. If you now hit the “Details” button on the autocomplete function, you can see it contains our updated “@[email protected]” parameter.

gargl12

Now let’s record our search function. Type a search query into Yahoo’s search bar (don’t hit Enter yet) and then click “Start Recording” in the Gargl tab.

gargl13

 

Click “Search” on the Yahoo page, or press Enter, to initiate the search request. You should now see a new request in the Gargl tab, with the URL “search.yahoo.com/search.” Name this request “Search”, it will become the Search function for our Yahoo module.

gargl14

 

Click the Edit button. You should see that the “p” query string parameter contains our search term.

gargl15

Change the search term to some parameterized input, like we did before for the autocomplete term. I chose for the function parameter to be called “query”.

gargl16

 

Unlike our autocomplete function’s response, which is formatted as JSON, our search function gets back HTML, since the page is displayed to the user. You can view the response of any request by clicking the “View Response” button on the function edit page. Clicking this for our search function gives us a big HTML file.

gargl17

 

Since we want our function to output some select data from the search results page, not a bunch of HTML, let’s figure out what the HTML elements that we care about in the returned HTML are. Let’s have our function return the titles of all the search results. Right click on a search result title on the Yahoo search results page, and go to “Inspect element.”

gargl18

 

As you can see from the HTML that gets highlighted in Chrome developer tools, the search result title is contained in an anchor tag, within an h3 tag. Clicking on other search result titles on this page reveals they fit the same pattern of an anchor tag within an h3 tag.

gargl19

 

Now let’s go back to our search function and make it grab this output instead of returning the entire HTML response it receives from the server. Click on the Gargl tab, then click Edit on the search function. Click “Add Response Field.” Here we can name every field we want the search function to output as part of the function’s output object. Enter a name for the field, and enter the CSS selector for anchor tags within an h3: “h3 a”.

gargl20

 

Hit “Test Selector” to have Gargl run the CSS selector you entered against the HTML response it recorded. As you can see below, it is correctly retrieving all the search result titles!

gargl21

gargl22

 

All done! Hit Save. We now have a Gargl module for Yahoo that contains functions for getting Yahoo search result titles for a term, and getting Yahoo autocomplete results for a term. In the Gargl tab, hit “Save As Gargl Template File” and then “Click to download.”

gargl23

 

This will download the Gargl template file (.gtf) for our module. I won’t go into the specifics of the schema of a Gargl template file here, but you can read up on it on GitHub.

gargl24

 

Great! Now we have our very own Gargl template! If we wanted to be a good Open Source Samaritan, we would add this Gargl template to the Gargl GitHub project here, so others could use a Gargl generator to turn it into a module in whatever programming language they want to integrate into Yahoo from. I’ve personally already uploaded this template to the Gargl templates directory, so we’ll skip that part.

Happy Gargling!

Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.

Introducing Gargl: Create an API for any website, in any programming language, without writing a line of code

Warning:
Integrating into a website / service without permission is a legally grey area. If a site does not have a documented API, its owners may be unhappy with you reverse-engineering their unofficial API and using it as part of an application, service, or other purpose. Legal cases have previously been fought and both won and lost by developers “misusing” an unofficial API for their own purposes.

Use caution and seek legal counsel before integrating another party’s unofficial API into any product, service, or tool you plan to expose to yourself or others. Things to worry about include Copyright InfringementTrespass to Chattels, and The Computer Fraud and Abuse Act. See Web Scraping – Legal Issues for more details.

 

What is Gargl?

Ever been unhappy to find that a website has no API, or makes you pay to use their API, even though the information you want access to is easily accessible manually use their website?

Because of the way websites work, if you can see or submit data using the website, it means the website does have some kind of API, even if the site owners haven’t documented it publicly. For example, if you do a search in Yahoo, you can see the search page sent to Yahoo’s servers has the following url (some url parameters omitted for readability):

https://search.yahoo.com/search?p=search+term

As you can see, the standard contract used to get a search page back from Yahoo is to send the server at search.yahoo.com the path “search,” and a query string parameter with key “p” and value being the term to search for, over HTTP.  All websites have “contracts” like this. These contracts are essentially undocumented APIs, meant for the browser to use!

Untitled

Using Fiddler to sniff Yahoo.com

From the above you can see that it is of course possible to use a website, “sniff” the underlying HTTP requests sent and data received back, and manually build a module in some programming language that pretends to be the browser, sends a request, and parses the returned HTML for the information needed. And many people do this today, using tools like WireShark and Fiddler, to allow individual websites to be accessible programmatically from individual programming languages.

But why should it be this way? Once one person figures out the undocumented API for a website, shouldn’t they be able to document that API somewhere, so others don’t have to do the work of figuring out the API for themselves? Better yet, shouldn’t some piece of software be able to parse this documentation, and generate a library that uses that API, in any programming language, so that developers don’t have to write code to integrate into that API at all?

Introducing Gargl.

Gargl (Generic API Recorder and Generator Lite, pronounced “Gargle”) is a project composed of multiple components meant to allow developers to easily figure out, document, and generate modules for the undocumented APIs that websites use to talk between client and server.

Gargl consists of three types of components:

  • A “template” declares the API for one or more functions of a website, using JSON. You can read up on the schema of templates here.
  • A “recorder” integrates into an existing software solution that is used to interact with websites, records requests made to servers and data received back, and allows the user to easily transform this data and create a template file.
  • A “generator” takes in a template file and a programming language, and spits out a module for that programming language that uses the API specified in the template file.

 


Gargl in action – Create an API for Yahoo in 3 minutes flat

Gargl is open source and hosted on GitHub. It currently contains a sample template for Yahoo search and autocomplete functions, as well as a recorder implemented as a Chrome extension, and a generator which supports generating modules in Java, PowerShell, and (Node.js, Browser, and WinJS compatible) JavaScript.

 

Gargl Recorder Chrome Extension

The Gargl Recorder Chrome Extension allows you to easily record websites to generate Gargl templates. You can get the extension in the Chrome Web Store or add it to Chrome yourself from its source code here. For a step by step guide on how to use the Gargl Recorder Chrome Extension to generate Gargl templates, take a look at my other post on Gargl, Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough.

 

Gargl Generator

A developer can generate a module from a Gargl template using the Gargl generator in the Gargl GitHub project.

To run the generator, execute the GarglJavaGenerator.jar file in the bld folder. Of course, you need to install Java on the computer you plan to run the Gargl generator from, since this generator is implemented in Java. Then run the following command, passing in the path to the .jar file, the path to the Gargl template, the output directory for the module, and the language to generate the module in:

java -jar bld\GarglJavaGenerator.jar -outdir some\output\location -input someGarglFile.gtf -lang someLanguage 

The allowed enums to -lang are “java,” “javascript,” and “powershell.”

As an example, I ran this Gargl generator against the Gargl template file we create in my blog post Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough, and specified JavaScript as the language. You can view the JavaScript module it spit out here.

 

Gargl Needs You

Hopefully now you have a good idea of what Gargl is, why it adds value, and how you can use it. Like any open source project, Gargl requires a great community to truly become great, itself. I encourage you to play around with and use Gargl to make some applications and services which integrate into websites with no public API, and contribute any Gargl templates you create doing so back to the Gargl open source repository. If you’re feeling extra creative you could even extend the existing Gargl generator to support another programming language, prettify the existing Gargl recorder, or create a Gargl recorder for some other program, like Fiddler.

I look forward to seeing how you take advantage of Gargl, and hope it proves useful to you.

Happy Gargling!

Update: Some folks have asked me how Gargl compares to a similar solution recently released called Kimono. I wrote a short post talking about why I think Gargl is better, you can read it here.

Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.