Integrating into a website / service without permission is a legally grey area. If a site does not have a documented API, its owners may be unhappy with you reverse-engineering their unofficial API and using it as part of an application, service, or other purpose. Legal cases have previously been fought and both won and lost by developers “misusing” an unofficial API for their own purposes.
Use caution and seek legal counsel before integrating another party’s unofficial API into any product, service, or tool you plan to expose to yourself or others. Things to worry about include Copyright Infringement, Trespass to Chattels, and The Computer Fraud and Abuse Act. See Web Scraping – Legal Issues for more details.
What is Gargl?
Ever been unhappy to find that a website has no API, or makes you pay to use their API, even though the information you want access to is easily accessible manually use their website?
Because of the way websites work, if you can see or submit data using the website, it means the website does have some kind of API, even if the site owners haven’t documented it publicly. For example, if you do a search in Yahoo, you can see the search page sent to Yahoo’s servers has the following url (some url parameters omitted for readability):
As you can see, the standard contract used to get a search page back from Yahoo is to send the server at search.yahoo.com the path “search,” and a query string parameter with key “p” and value being the term to search for, over HTTP. All websites have “contracts” like this. These contracts are essentially undocumented APIs, meant for the browser to use!
From the above you can see that it is of course possible to use a website, “sniff” the underlying HTTP requests sent and data received back, and manually build a module in some programming language that pretends to be the browser, sends a request, and parses the returned HTML for the information needed. And many people do this today, using tools like WireShark and Fiddler, to allow individual websites to be accessible programmatically from individual programming languages.
But why should it be this way? Once one person figures out the undocumented API for a website, shouldn’t they be able to document that API somewhere, so others don’t have to do the work of figuring out the API for themselves? Better yet, shouldn’t some piece of software be able to parse this documentation, and generate a library that uses that API, in any programming language, so that developers don’t have to write code to integrate into that API at all?
Gargl (Generic API Recorder and Generator Lite, pronounced “Gargle”) is a project composed of multiple components meant to allow developers to easily figure out, document, and generate modules for the undocumented APIs that websites use to talk between client and server.
Gargl consists of three types of components:
- A “template” declares the API for one or more functions of a website, using JSON. You can read up on the schema of templates here.
- A “recorder” integrates into an existing software solution that is used to interact with websites, records requests made to servers and data received back, and allows the user to easily transform this data and create a template file.
- A “generator” takes in a template file and a programming language, and spits out a module for that programming language that uses the API specified in the template file.
Gargl in action – Create an API for Yahoo in 3 minutes flat
Gargl Recorder Chrome Extension
The Gargl Recorder Chrome Extension allows you to easily record websites to generate Gargl templates. You can get the extension in the Chrome Web Store or add it to Chrome yourself from its source code here. For a step by step guide on how to use the Gargl Recorder Chrome Extension to generate Gargl templates, take a look at my other post on Gargl, Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough.
A developer can generate a module from a Gargl template using the Gargl generator in the Gargl GitHub project.
To run the generator, execute the GarglJavaGenerator.jar file in the bld folder. Of course, you need to install Java on the computer you plan to run the Gargl generator from, since this generator is implemented in Java. Then run the following command, passing in the path to the .jar file, the path to the Gargl template, the output directory for the module, and the language to generate the module in:
java -jar bld\GarglJavaGenerator.jar -outdir some\output\location -input someGarglFile.gtf -lang someLanguage
Gargl Needs You
Hopefully now you have a good idea of what Gargl is, why it adds value, and how you can use it. Like any open source project, Gargl requires a great community to truly become great, itself. I encourage you to play around with and use Gargl to make some applications and services which integrate into websites with no public API, and contribute any Gargl templates you create doing so back to the Gargl open source repository. If you’re feeling extra creative you could even extend the existing Gargl generator to support another programming language, prettify the existing Gargl recorder, or create a Gargl recorder for some other program, like Fiddler.
I look forward to seeing how you take advantage of Gargl, and hope it proves useful to you.
Update: Some folks have asked me how Gargl compares to a similar solution recently released called Kimono. I wrote a short post talking about why I think Gargl is better, you can read it here.