- Gargl is free
- Gargl is open source
- Gargl is “component-tized” – you can write a Gargl recorder for another program like Fiddler or Wireshark, or another browser, and as long as it outputs valid .gtf files, it can be used with a generator (which again is component-tized; you can implement you own generator).
- The APIs it creates and lets you edit can handle more than just GET requests, which is all Kimono supports. Ex: POST
- As long as the generated modules keep track of cookies, you can use Gargl for API requests that require user authentication. Not supported with Kimono.
- Kimono will only scrape at most every hour, and then the API you use gives you stale data. Gargl APIs are as live as you want them to be.
- Kimono operates as a service, even if they “host” your API, you still need to write an API to integrate into their API for APIs! I’m sure they have this available as a client library for some languages, but not all. Gargl will output the module straight in any programming language it supports.
- Kimono operates as a service, so if a site doesn’t want to be scraped by them anymore, it just has to ban Kimono’s IP address. Then your Kimono API would stop working. Gargl modules can be used server or client side. With Gargl, you could run into the same issue using a Gargl module on a server, but if you embed the module directly into a client app so that the module is talking to the website using the client device’s IP address, there’s no way for the site to stop access because requests are coming from each client IP, not from one server IP.
- In my opinion, Kimono is in a dubious area legally. They scrape others’ websites, and expose that data over an API to any paying customer. If a customer does something the site being scraped didn’t want to happen, the site owners could sue Kimono (and very possibly win) for providing this information to its customer, who is doing something the site considers misappropriation of their data or software. See the beginning of my first post on Gargl, where I talk about the legal issues. If Kimono gets sued and shuts down, anyone who used their site to make APIs immediately loses those APIs. With Gargl, it is a decentralized, open source, tool. If someone uses Gargl to make a module and then uses that module to do something against a site that the site did not want to happen, they may get sued, but the tool to generate the module is not directly responsible unlike Kimono, which is actively scraping the website and even being paid by the “malicious” customer.
- Using Kimono, any data you receive has to go through Kimono’s servers in order to get to the API caller. If the data is sensitive, you obviously don’t want it passing through a third party server that you have no control over. Gargl modules connect the API caller straight to the original website, so there’s no third party in the middle you have to trust.
Monthly Archives: February 2014
Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough
Hopefully by now you’ve learned a bit about what Gargl is all about, and why you’d use it. If not, take a look at my other blog post on Gargl first, Introducing Gargl: Create an API for any website, without writing a line of code.
Now that you know what Gargl is, lets talk about how using the Gargl Recorder Chrome Extension lets you easily record websites and generate Gargl templates. You can get the extension in the Chrome Web Store or add it to Chrome yourself from its source code here.
Gargl in action – Create an API for Yahoo in 3 minutes flat
After installing the Gargl Chrome extension, you can start using it by going into Chrome and hitting F12 to bring up the Chrome developer tools. You should notice on the far right a Gargl tab.
Click on the Gargl tab. You can see in the Gargl tab that you can load an existing template file, or start from scratch on a new one. Let’s use Yahoo’s autocomplete and search “functions” to demonstrate how the Gargl recorder works. Navigate in the browser to the Yahoo homepage.
Click ‘Start from Scratch’ and enter the module name and module description for the template file we’ll be creating. Since we only want to record the requests and responses against Yahoo, enter “yahoo” in the filter requests textbox.
Click “Start Recording” to tell Gargl to start monitoring requests. Now let’s use the search bar on Yahoo’s site like we normally do. Don’t actually search, just type in the search bar to generate autocomplete results. As we type and autocomplete results roll in, we can see a request appears in Gargl for every letter we type. This is because Yahoo is getting updated autocomplete results from the server every time we type another letter.
Looking in Gargl, you should see a lot of “search.yahoo.com/sugg” requests. Each of these represents a request / response that Gargl logged when Yahoo’s search page asked the server for autocomplete results.
Now hit the “Stop Recording” button. Clicking on the details button for any of these requests will show us the details for the request. As you can see, it shows us the URL, method, and query string parameters of the request. For post requests it will also show the post data, of course.
Since all of these requests are the same function (autocomplete, just with different ‘term to complete’ data), hit remove on all but the last “search.yaho.com/sugg” entry. For that last entry, enter “Autocomplete” for the name of the function.
Hit the Edit button for the autocomplete function. As you can see there is a lot of data about the request / response of this function that was recorded and you can adjust. Enter a description for the autocomplete function. You could also change the request url here if you wanted to parameterize it (more on that below), but since this URL doesn’t contain any parameters that we care about, let’s leave it as is.
Similarly, you could adjust or parameterize the request headers sent to the server for this function if you wanted. We don’t need to do that for this function though.
You can also change the query string parameters sent in the request (or post data if the request is a post). As you can see, the “command” parameter had a value of “hello kitty is” when the request was sent. This is because this is what I was typing in the search box, which Yahoo wanted autocomplete suggestions from the server for.
We could change the value of “command” to some other static value, like”spaghetti and “, to make the function always request autocomplete suggestions for “spaghetti and “, but lets instead create a function that takes in a parameter which becomes the value of the command query string parameter, so that the invoker of this function can specify what term to get autocomplete suggestions for. Setting the value of a key in the request to a parameter rather than a static value is done through a process called “parameterization.”
Parameterization is very easy to do — just replace the current static value for the key with the name of the parameter you want to be a parameter to the function, surrounded by “@” signs. As you can see below, I want the autocomplete function of my Yahoo module to take a parameter named “term”, whose value becomes the value of the “command” query string parameter of the request that is made.
Now hit “Save” to return to the page showing all your functions. If you now hit the “Details” button on the autocomplete function, you can see it contains our updated “@term@” parameter.
Now let’s record our search function. Type a search query into Yahoo’s search bar (don’t hit Enter yet) and then click “Start Recording” in the Gargl tab.
Click “Search” on the Yahoo page, or press Enter, to initiate the search request. You should now see a new request in the Gargl tab, with the URL “search.yahoo.com/search.” Name this request “Search”, it will become the Search function for our Yahoo module.
Click the Edit button. You should see that the “p” query string parameter contains our search term.
Change the search term to some parameterized input, like we did before for the autocomplete term. I chose for the function parameter to be called “query”.
Unlike our autocomplete function’s response, which is formatted as JSON, our search function gets back HTML, since the page is displayed to the user. You can view the response of any request by clicking the “View Response” button on the function edit page. Clicking this for our search function gives us a big HTML file.
Since we want our function to output some select data from the search results page, not a bunch of HTML, let’s figure out what the HTML elements that we care about in the returned HTML are. Let’s have our function return the titles of all the search results. Right click on a search result title on the Yahoo search results page, and go to “Inspect element.”
As you can see from the HTML that gets highlighted in Chrome developer tools, the search result title is contained in an anchor tag, within an h3 tag. Clicking on other search result titles on this page reveals they fit the same pattern of an anchor tag within an h3 tag.
Now let’s go back to our search function and make it grab this output instead of returning the entire HTML response it receives from the server. Click on the Gargl tab, then click Edit on the search function. Click “Add Response Field.” Here we can name every field we want the search function to output as part of the function’s output object. Enter a name for the field, and enter the CSS selector for anchor tags within an h3: “h3 a”.
Hit “Test Selector” to have Gargl run the CSS selector you entered against the HTML response it recorded. As you can see below, it is correctly retrieving all the search result titles!
All done! Hit Save. We now have a Gargl module for Yahoo that contains functions for getting Yahoo search result titles for a term, and getting Yahoo autocomplete results for a term. In the Gargl tab, hit “Save As Gargl Template File” and then “Click to download.”
This will download the Gargl template file (.gtf) for our module. I won’t go into the specifics of the schema of a Gargl template file here, but you can read up on it on GitHub.
Great! Now we have our very own Gargl template! If we wanted to be a good Open Source Samaritan, we would add this Gargl template to the Gargl GitHub project here, so others could use a Gargl generator to turn it into a module in whatever programming language they want to integrate into Yahoo from. I’ve personally already uploaded this template to the Gargl templates directory, so we’ll skip that part.
Happy Gargling!
Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.
Introducing Gargl: Create an API for any website, in any programming language, without writing a line of code
Warning:
Integrating into a website / service without permission is a legally grey area. If a site does not have a documented API, its owners may be unhappy with you reverse-engineering their unofficial API and using it as part of an application, service, or other purpose. Legal cases have previously been fought and both won and lost by developers “misusing” an unofficial API for their own purposes.
Use caution and seek legal counsel before integrating another party’s unofficial API into any product, service, or tool you plan to expose to yourself or others. Things to worry about include Copyright Infringement, Trespass to Chattels, and The Computer Fraud and Abuse Act. See Web Scraping – Legal Issues for more details.
What is Gargl?
Ever been unhappy to find that a website has no API, or makes you pay to use their API, even though the information you want access to is easily accessible manually use their website?
Because of the way websites work, if you can see or submit data using the website, it means the website does have some kind of API, even if the site owners haven’t documented it publicly. For example, if you do a search in Yahoo, you can see the search page sent to Yahoo’s servers has the following url (some url parameters omitted for readability):
https://search.yahoo.com/search?p=search+term
As you can see, the standard contract used to get a search page back from Yahoo is to send the server at search.yahoo.com the path “search,” and a query string parameter with key “p” and value being the term to search for, over HTTP. All websites have “contracts” like this. These contracts are essentially undocumented APIs, meant for the browser to use!
From the above you can see that it is of course possible to use a website, “sniff” the underlying HTTP requests sent and data received back, and manually build a module in some programming language that pretends to be the browser, sends a request, and parses the returned HTML for the information needed. And many people do this today, using tools like WireShark and Fiddler, to allow individual websites to be accessible programmatically from individual programming languages.
But why should it be this way? Once one person figures out the undocumented API for a website, shouldn’t they be able to document that API somewhere, so others don’t have to do the work of figuring out the API for themselves? Better yet, shouldn’t some piece of software be able to parse this documentation, and generate a library that uses that API, in any programming language, so that developers don’t have to write code to integrate into that API at all?
Introducing Gargl.
Gargl (Generic API Recorder and Generator Lite, pronounced “Gargle”) is a project composed of multiple components meant to allow developers to easily figure out, document, and generate modules for the undocumented APIs that websites use to talk between client and server.
Gargl consists of three types of components:
- A “template” declares the API for one or more functions of a website, using JSON. You can read up on the schema of templates here.
- A “recorder” integrates into an existing software solution that is used to interact with websites, records requests made to servers and data received back, and allows the user to easily transform this data and create a template file.
- A “generator” takes in a template file and a programming language, and spits out a module for that programming language that uses the API specified in the template file.
Gargl in action – Create an API for Yahoo in 3 minutes flat
Gargl is open source and hosted on GitHub. It currently contains a sample template for Yahoo search and autocomplete functions, as well as a recorder implemented as a Chrome extension, and a generator which supports generating modules in Java, PowerShell, and (Node.js, Browser, and WinJS compatible) JavaScript.
Gargl Recorder Chrome Extension
The Gargl Recorder Chrome Extension allows you to easily record websites to generate Gargl templates. You can get the extension in the Chrome Web Store or add it to Chrome yourself from its source code here. For a step by step guide on how to use the Gargl Recorder Chrome Extension to generate Gargl templates, take a look at my other post on Gargl, Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough.
Gargl Generator
A developer can generate a module from a Gargl template using the Gargl generator in the Gargl GitHub project.
To run the generator, execute the GarglJavaGenerator.jar file in the bld folder. Of course, you need to install Java on the computer you plan to run the Gargl generator from, since this generator is implemented in Java. Then run the following command, passing in the path to the .jar file, the path to the Gargl template, the output directory for the module, and the language to generate the module in:
java -jar bld\GarglJavaGenerator.jar -outdir some\output\location -input someGarglFile.gtf -lang someLanguage
The allowed enums to -lang are “java,” “javascript,” and “powershell.”
As an example, I ran this Gargl generator against the Gargl template file we create in my blog post Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough, and specified JavaScript as the language. You can view the JavaScript module it spit out here.
Gargl Needs You
Hopefully now you have a good idea of what Gargl is, why it adds value, and how you can use it. Like any open source project, Gargl requires a great community to truly become great, itself. I encourage you to play around with and use Gargl to make some applications and services which integrate into websites with no public API, and contribute any Gargl templates you create doing so back to the Gargl open source repository. If you’re feeling extra creative you could even extend the existing Gargl generator to support another programming language, prettify the existing Gargl recorder, or create a Gargl recorder for some other program, like Fiddler.
I look forward to seeing how you take advantage of Gargl, and hope it proves useful to you.
Happy Gargling!
Update: Some folks have asked me how Gargl compares to a similar solution recently released called Kimono. I wrote a short post talking about why I think Gargl is better, you can read it here.
Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.