Integrating into a website / service without permission is a legally grey area. If a site does not have a documented API, its owners may be unhappy with you reverse-engineering their unofficial API and using it as part of an application, service, or other purpose. Legal cases have previously been fought and both won and lost by developers “misusing” an unofficial API for their own purposes.
Use caution and seek legal counsel before integrating another party’s unofficial API into any product, service, or tool you plan to expose to yourself or others. Things to worry about include Copyright Infringement, Trespass to Chattels, and The Computer Fraud and Abuse Act. See Web Scraping – Legal Issues for more details.
What is Gargl?
Ever been unhappy to find that a website has no API, or makes you pay to use their API, even though the information you want access to is easily accessible manually use their website?
Because of the way websites work, if you can see or submit data using the website, it means the website does have some kind of API, even if the site owners haven’t documented it publicly. For example, if you do a search in Yahoo, you can see the search page sent to Yahoo’s servers has the following url (some url parameters omitted for readability):
As you can see, the standard contract used to get a search page back from Yahoo is to send the server at search.yahoo.com the path “search,” and a query string parameter with key “p” and value being the term to search for, over HTTP. All websites have “contracts” like this. These contracts are essentially undocumented APIs, meant for the browser to use!
From the above you can see that it is of course possible to use a website, “sniff” the underlying HTTP requests sent and data received back, and manually build a module in some programming language that pretends to be the browser, sends a request, and parses the returned HTML for the information needed. And many people do this today, using tools like WireShark and Fiddler, to allow individual websites to be accessible programmatically from individual programming languages.
But why should it be this way? Once one person figures out the undocumented API for a website, shouldn’t they be able to document that API somewhere, so others don’t have to do the work of figuring out the API for themselves? Better yet, shouldn’t some piece of software be able to parse this documentation, and generate a library that uses that API, in any programming language, so that developers don’t have to write code to integrate into that API at all?
Gargl (Generic API Recorder and Generator Lite, pronounced “Gargle”) is a project composed of multiple components meant to allow developers to easily figure out, document, and generate modules for the undocumented APIs that websites use to talk between client and server.
Gargl consists of three types of components:
- A “template” declares the API for one or more functions of a website, using JSON. You can read up on the schema of templates here.
- A “recorder” integrates into an existing software solution that is used to interact with websites, records requests made to servers and data received back, and allows the user to easily transform this data and create a template file.
- A “generator” takes in a template file and a programming language, and spits out a module for that programming language that uses the API specified in the template file.
Gargl in action – Create an API for Yahoo in 3 minutes flat
Gargl Recorder Chrome Extension
The Gargl Recorder Chrome Extension allows you to easily record websites to generate Gargl templates. You can get the extension in the Chrome Web Store or add it to Chrome yourself from its source code here. For a step by step guide on how to use the Gargl Recorder Chrome Extension to generate Gargl templates, take a look at my other post on Gargl, Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough.
A developer can generate a module from a Gargl template using the Gargl generator in the Gargl GitHub project.
To run the generator, execute the GarglJavaGenerator.jar file in the bld folder. Of course, you need to install Java on the computer you plan to run the Gargl generator from, since this generator is implemented in Java. Then run the following command, passing in the path to the .jar file, the path to the Gargl template, the output directory for the module, and the language to generate the module in:
java -jar bld\GarglJavaGenerator.jar -outdir some\output\location -input someGarglFile.gtf -lang someLanguage
Gargl Needs You
Hopefully now you have a good idea of what Gargl is, why it adds value, and how you can use it. Like any open source project, Gargl requires a great community to truly become great, itself. I encourage you to play around with and use Gargl to make some applications and services which integrate into websites with no public API, and contribute any Gargl templates you create doing so back to the Gargl open source repository. If you’re feeling extra creative you could even extend the existing Gargl generator to support another programming language, prettify the existing Gargl recorder, or create a Gargl recorder for some other program, like Fiddler.
I look forward to seeing how you take advantage of Gargl, and hope it proves useful to you.
Update: Some folks have asked me how Gargl compares to a similar solution recently released called Kimono. I wrote a short post talking about why I think Gargl is better, you can read it here.
Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.
20 thoughts on “Introducing Gargl: Create an API for any website, in any programming language, without writing a line of code”
Mr Levy, my hat is off to you. I thought of exactly this 10 years ago, as I was building yet another scraper. But thinking of it and doing it are 2 entirely different things! Well done. The architecture looks great too. I will help.
Yes, that is correct. Both the PowerShell and Java Gargl modules currently do not support the response fields element of a Gargl template. They don’t take it into account when generating the module right now. We definitely want this for the future though. Feel free to make a pull request adding this functionality, or open an issue and maybe someone will pick it up.
Hello everyone, and many thanx Jo!
Gargl is such a great work!
A possible hint on how to achieve java response fields: http://stackoverflow.com/questions/2843258/tear-substring-from-within-html-tags-with-java
I know it’s probably a noob contrib but trying to help anyway.
Hi what a great idea! I tried -lang powershell and when executing a command in PowerShell (3.0), I got error message “Invoke-WebRequest : The ‘User-Agent’ header must be modified using the appropriate property or method.”. Trying to figure how to modify x.psm1.
I have only tried in PowerShell 4.0, but it has worked for me. Maybe its a PS 3.0 thing. Either way, the answer seems to be to set the”Referer” property of the request instead of just adding a referer header the the list of headers: http://stackoverflow.com/questions/239725/cannot-set-some-http-headers-when-using-system-net-webrequest
Wanna add this functionality to the PowerShell generator? Issue a pull request
Is It possible to use garl to do the same kind of search but on a protected page behind a login/password page requiring cookies etc ?
I managed to generate the js file, it creates a lot of header data
“Accept-Language”,”Host”,”Origin”,”Accept”, I know that it might be out of scope but I get error messages “Refused to set unsafe header ” related to the cross domain request…
If you have a tip to get rid of it ( I don’t have any control on the server to enable cors ), it would be cool !
There’s no way to successfully send these headers, as XMLHTTPRequest simply won’t allow it due to security reasons (depending on the context, ex browser vs in node js, different fields are allowed). While these are non-terminating errors, I realize it can be annoying to see them showing up. Easiest fix is to open up the js file you generated, and remove the lines that add these headers to the XMLHTTPRequest.
Amazing work ! Exactly what I was looking for years !
Some others troubles:
1. When you use the same parameter (@[email protected]) twice in the same function, the generator duplicate the parameter even if it already exists.
2. Impossible to reload a gtf in gargl extension, not a real problem but it’s can be cool if it works
It should be possible to do the response fields part in PowerShell, but it isn’t implemented yet. Feel free to contribute and add that part
Yes duplicate parameters aren’t handled by the generator yet. It should be possible to reload a gtf, not sure why that isn’t working.
Can you open an issue on Github for each of these issues? Easier to track that way, and then if someone else wants to contribute they can easily see some things to work on.
Is it possible to extend this to Candy Crush Soda Saga?
Do you have any plan to create a generator for php?
In fact several output types like html table, or xml? Thank you.