Introducing Gargl: Create an API for any website, in any programming language, without writing a line of code

Warning:
Integrating into a website / service without permission is a legally grey area. If a site does not have a documented API, its owners may be unhappy with you reverse-engineering their unofficial API and using it as part of an application, service, or other purpose. Legal cases have previously been fought and both won and lost by developers “misusing” an unofficial API for their own purposes.

Use caution and seek legal counsel before integrating another party’s unofficial API into any product, service, or tool you plan to expose to yourself or others. Things to worry about include Copyright InfringementTrespass to Chattels, and The Computer Fraud and Abuse Act. See Web Scraping – Legal Issues for more details.

 

What is Gargl?

Ever been unhappy to find that a website has no API, or makes you pay to use their API, even though the information you want access to is easily accessible manually use their website?

Because of the way websites work, if you can see or submit data using the website, it means the website does have some kind of API, even if the site owners haven’t documented it publicly. For example, if you do a search in Yahoo, you can see the search page sent to Yahoo’s servers has the following url (some url parameters omitted for readability):

https://search.yahoo.com/search?p=search+term

As you can see, the standard contract used to get a search page back from Yahoo is to send the server at search.yahoo.com the path “search,” and a query string parameter with key “p” and value being the term to search for, over HTTP.  All websites have “contracts” like this. These contracts are essentially undocumented APIs, meant for the browser to use!

Untitled

Using Fiddler to sniff Yahoo.com

From the above you can see that it is of course possible to use a website, “sniff” the underlying HTTP requests sent and data received back, and manually build a module in some programming language that pretends to be the browser, sends a request, and parses the returned HTML for the information needed. And many people do this today, using tools like WireShark and Fiddler, to allow individual websites to be accessible programmatically from individual programming languages.

But why should it be this way? Once one person figures out the undocumented API for a website, shouldn’t they be able to document that API somewhere, so others don’t have to do the work of figuring out the API for themselves? Better yet, shouldn’t some piece of software be able to parse this documentation, and generate a library that uses that API, in any programming language, so that developers don’t have to write code to integrate into that API at all?

Introducing Gargl.

Gargl (Generic API Recorder and Generator Lite, pronounced “Gargle”) is a project composed of multiple components meant to allow developers to easily figure out, document, and generate modules for the undocumented APIs that websites use to talk between client and server.

Gargl consists of three types of components:

  • A “template” declares the API for one or more functions of a website, using JSON. You can read up on the schema of templates here.
  • A “recorder” integrates into an existing software solution that is used to interact with websites, records requests made to servers and data received back, and allows the user to easily transform this data and create a template file.
  • A “generator” takes in a template file and a programming language, and spits out a module for that programming language that uses the API specified in the template file.

 


Gargl in action – Create an API for Yahoo in 3 minutes flat

Gargl is open source and hosted on GitHub. It currently contains a sample template for Yahoo search and autocomplete functions, as well as a recorder implemented as a Chrome extension, and a generator which supports generating modules in Java, PowerShell, and (Node.js, Browser, and WinJS compatible) JavaScript.

 

Gargl Recorder Chrome Extension

The Gargl Recorder Chrome Extension allows you to easily record websites to generate Gargl templates. You can get the extension in the Chrome Web Store or add it to Chrome yourself from its source code here. For a step by step guide on how to use the Gargl Recorder Chrome Extension to generate Gargl templates, take a look at my other post on Gargl, Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough.

 

Gargl Generator

A developer can generate a module from a Gargl template using the Gargl generator in the Gargl GitHub project.

To run the generator, execute the GarglJavaGenerator.jar file in the bld folder. Of course, you need to install Java on the computer you plan to run the Gargl generator from, since this generator is implemented in Java. Then run the following command, passing in the path to the .jar file, the path to the Gargl template, the output directory for the module, and the language to generate the module in:

java -jar bld\GarglJavaGenerator.jar -outdir some\output\location -input someGarglFile.gtf -lang someLanguage 

The allowed enums to -lang are “java,” “javascript,” and “powershell.”

As an example, I ran this Gargl generator against the Gargl template file we create in my blog post Using the Gargl Recorder Chrome Extension – A Step by Step Walkthrough, and specified JavaScript as the language. You can view the JavaScript module it spit out here.

 

Gargl Needs You

Hopefully now you have a good idea of what Gargl is, why it adds value, and how you can use it. Like any open source project, Gargl requires a great community to truly become great, itself. I encourage you to play around with and use Gargl to make some applications and services which integrate into websites with no public API, and contribute any Gargl templates you create doing so back to the Gargl open source repository. If you’re feeling extra creative you could even extend the existing Gargl generator to support another programming language, prettify the existing Gargl recorder, or create a Gargl recorder for some other program, like Fiddler.

I look forward to seeing how you take advantage of Gargl, and hope it proves useful to you.

Happy Gargling!

Update: Some folks have asked me how Gargl compares to a similar solution recently released called Kimono. I wrote a short post talking about why I think Gargl is better, you can read it here.

Interested in hearing about other side projects like this one? Subscribe to my blog and follow me on Twitter. I’ll let you know when I think of something fun.

20 thoughts on “Introducing Gargl: Create an API for any website, in any programming language, without writing a line of code

  1. Pingback: Tools | Pearltrees
  2. Mr Levy, my hat is off to you. I thought of exactly this 10 years ago, as I was building yet another scraper. But thinking of it and doing it are 2 entirely different things! Well done. The architecture looks great too. I will help.

    Cheers
    Dennis

  3. Hi Joe, thanks for the articles and for creating Gargl and making it available as open source. I followed the step by step tutorial for using the Gargl recorder in Chrome (http://jodoglevy.com/jobloglevy/?p=85) and then ran the generator from the command line as outlined in this article. I ran the generator twice — once for -lang java and once for -lang javascript.

    It seems like only the javascript code that was generated supports the Response Fields (in the example, the CSS selector was h3 a), but the generated java code does not, is that correct? And do you have any plans in the future to add that functionality for java code generation?

    Regards,
    Jessica

    1. Hi Jessica,

      Yes, that is correct. Both the PowerShell and Java Gargl modules currently do not support the response fields element of a Gargl template. They don’t take it into account when generating the module right now. We definitely want this for the future though. Feel free to make a pull request adding this functionality, or open an issue and maybe someone will pick it up.

  4. Hi what a great idea! I tried -lang powershell and when executing a command in PowerShell (3.0), I got error message “Invoke-WebRequest : The ‘User-Agent’ header must be modified using the appropriate property or method.”. Trying to figure how to modify x.psm1.

  5. Impressive!
    Is It possible to use garl to do the same kind of search but on a protected page behind a login/password page requiring cookies etc ?

    Thanks
    Jl

    1. Yes, as long as the underlying generated module keeps track of cookies received in responses and sends those cookies on subsequent API calls, just like a browser would, it should work fine for “normal” websites that use regular cookies for remembering if user is logged in. Gargl modules generated as PowerShell, or as Javascript (and used in a WinJS project) do this “cookie remembering” today. It could also of course be possible for the user to remember the cookie themselves in their code (after it gets the raw response from the API call), and then pass that cookie into any subsequent API calls manually.

  6. thanks,
    I managed to generate the js file, it creates a lot of header data
    “Accept-Language”,”Host”,”Origin”,”Accept”, I know that it might be out of scope but I get error messages “Refused to set unsafe header ” related to the cross domain request…
    If you have a tip to get rid of it ( I don’t have any control on the server to enable cors ), it would be cool !
    jl

    1. There’s no way to successfully send these headers, as XMLHTTPRequest simply won’t allow it due to security reasons (depending on the context, ex browser vs in node js, different fields are allowed). While these are non-terminating errors, I realize it can be annoying to see them showing up. Easiest fix is to open up the js file you generated, and remove the lines that add these headers to the XMLHTTPRequest.

  7. Hi,

    Amazing work ! Exactly what I was looking for years !

    By the way, I have some trouble when i try to generate code in powershell, the reponse part is missing in powershell code, when in javascript it’s present (refer to “h3 a”). This is a misconception or it’s just not possible to do easily in powershell ?

    Some others troubles:

    1. When you use the same parameter (@variable@) twice in the same function, the generator duplicate the parameter even if it already exists.
    2. Impossible to reload a gtf in gargl extension, not a real problem but it’s can be cool if it works

    1. It should be possible to do the response fields part in PowerShell, but it isn’t implemented yet. Feel free to contribute and add that part :)

      Yes duplicate parameters aren’t handled by the generator yet. It should be possible to reload a gtf, not sure why that isn’t working.

      Can you open an issue on Github for each of these issues? Easier to track that way, and then if someone else wants to contribute they can easily see some things to work on.

Leave a Reply to Dennis Wollersheim Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>