This is the home of the Metadata Labs API documentation. Here is where you will find detailed information about our data APIs and how to use them.
The current API consists of the following:
The above forms a small subset of the entire API that we will release over time. To see an example of what will be possible when the full API is available, take a look at our debsnews site. That site currently uses parts of the API that are not yet publicly available but will be in the future.
If you have any questions about the Metadata Labs API or how to use it in your programs, visit metadatalabs on Get Satisfaction.
The Metadata Labs Firehose is a continually-updated stream of links to recently posted online video, with metadata, including a reputation scoring system that allows you to differentiate between videos from known good sources and those from sources that are known spammers or that have no reputation yet. The video sources include high-profile sites such as YouTube and Vimeo, as well as major online news sites and disparate pages from all over the Web. Our focused crawlers find, extract metadata for, and publish links to more than 40 hours of media every minute. The Firehose allows you to tap into this data.
Currently, the Firehose is updated once per minute and can be polled through a simple Web API that you can access from any tool that allows Web access, including JavaScript, wget, and curl. In the near future, there will be a streaming Firehose to which you can subscribe and get updates immediately, without polling. The API returns JSON-formatted data, and supports JSON-P so that JavaScript applications can access it.
The Firehose API consists of a single HTTP GET request to which you can pass different parameters to specify the format of the data that you want returned. The data is returned chronological (that is, in the order items were found by our crawlers). The basic form of the request is:
http://api.metadatalabs.com/firehose.json?callback=name
The parameters are:
| Parameter | Description |
|---|---|
| callback | For JSON-P, the name of the function to be called with the data. This parameter is always optional. |
Currently, we only support JSON formatted data.
To get the current firehose data, the request would be:
http://api.metadatalabs.com/firehose.json
We make the Firehose available to the Internet community free of charge so that all may benefit. However, we do have limited bandwidth, so we ask that you limit your requests to no more than you need. The Firehose data is updated once per minute, and averages about 300 kilobytes. Requesting data more often than once per minute will only waste bandwidth. To avoid wasting bandwidth and potentially being banned from using the service, you should structure your code so that it performs a conditional GET request approximately once every 30 seconds. Doing so will ensure that you won't miss any data or receive duplicate data, and that you won't be blocked because you're querying the server too often.
The code samples below show how to perform conditional GET requests in C# and PHP. JavaScript programmers should refer to the JavaScript libraries.
If you are unable to perform conditional GET requests as shown in the examples above, then please limit your requests to no more than once every 60 seconds.
Sites that continually query the Firehose too frequently will be throttled with a warning. Repeated infractions will result in a ban of your IP address.
The Firehose data is a very simple structure consisting of a brief header and a list of media items. Below is an abbreviated example that has been formatted for readability.
{
"pubDate":"Tue, 27 Jul 2010 15:01:31 GMT",
"totalItems":588,
"count":2,
"items":
[
{
"title":"\u00cf\u00f2\u00e8\u00f6\u00fb \u00e1\u00e5\u00eb\u00fb\u00e5",
"mediaUrl":"http:\/\/m3.spaces.ru\/mmmm\/803304144101\/128\/7068088\/spaces_ru_7068088.mp3",
"score":"0.0302",
"author":"www!http:\/\/spaces.ru",
"pubDate":"Tue, 27 Jul 2010 14:57:11 GMT",
"content_loc":"http:\/\/m3.spaces.ru\/mmmm\/803304144101\/128\/7068088\/spaces_ru_7068088.mp3",
"fileSize":"3325440",
"type":"audio\/mpeg",
"medium":"audio",
"adult":"0",
"category":"\u00c8\u00ed\u00f1\u00f2\u00f0\u00f3\u00ec\u00e5\u00ed\u00f2\u00e0\u00eb"
},
{
"title":"Untitled",
"mediaUrl":"http:\/\/vmusik.ru\/download\/aHR0cDovL2NzNDc4MC52a29udGFrdGUucnUvdTMxNzc0MTk0L2F1ZGlvLzMwMTlmY2I3ZDg3OC5tcDM=\/Zanuda_CAO_D_masta_Def_Joint_Dum_Legendu_pro_Cmoki_Mo_Def_Joint_Fon_Remix_by_Likey.mp3",
"score":"0.001",
"author":"www!http:\/\/vmusik.ru",
"pubDate":"Tue, 27 Jul 2010 14:59:52 GMT",
"content_loc":"http:\/\/vmusik.ru\/download\/aHR0cDovL2NzNDc4MC52a29udGFrdGUucnUvdTMxNzc0MTk0L2F1ZGlvLzMwMTlmY2I3ZDg3OC5tcDM=\/Zanuda_CAO_D_masta_Def_Joint_Dum_Legendu_pro_Cmoki_Mo_Def_Joint_Fon_Remix_by_Likey.mp3",
"fileSize":"12767676",
"type":"audio\/mpeg",
"medium":"audio",
"adult":"0"
}
]
}
Not all records will contain all fields. Because many media files do not have metadata, or contain just some of the metadata fields that we support, only non-blank fields are reported in the Firehose. Some fields are calculated and reported for all records. The possible fields and their meanings are shown below.
| Firehose fields | |
|---|---|
| Field name | Description |
| pubDate | The date and time, in UTC, that this group of links was published in the Firehose. |
| totalItems | The total number of items in the current Firehose data set. |
| count | The number of items present in this response data. |
| items | An array [count] of item entries, describing the media. |
| Media item fields | |
|---|---|
| Field name | Description |
| title | The title assigned by the author of the media. If no title is found in the metadata, this field will contain the string, "Untitled". |
| mediaUrl | The fully-qualified URL to the authority page for the media. This can differ from the content_loc field. See Note 1, below. |
| score | A quality score, ranging from 0.001 to 1.000, assigned to the media. See the discussion of Quality Score below for more information. |
| author | A site-specific name identifying the media's author. Note that this is based on the location where the media was found rather than on metadata found in the media file itself. For known sites such as YouTube, the generated name is of the form site!username. For example: "YouTube!jnuscgek". For media found on the Web in general, the form is www!domain. For example: "www!http://podly.tv". |
| artist | The artist who performed the work. This field typically will be found only in MP3 records. |
| description | Descriptive information about the media, as found in the media metadata. For video hosting sites, this will be the description entered on the authority page. When media is found through an RSS feed, this will be the description of the media as found in that RSS feed. This field is a combination of a number of different metadata fields found in the different media formats supported. |
| pubDate | The date the media was published, as reported by the Web site or by the media's authority page or RSS feed. |
| content_loc | The URL to the actual media file. This can differ from the mediaUrl field. See Note 1, below. |
| fileSize | The size of the media file, as reported by the Web site that hosts the media. This field is typically not supplied by hosting sites such as YouTube and Vimeo, but exists for most free-range video and audio files. |
| type | The media's MIME type. Note that we use extended MIME types for hosting sites that do not provide direct access to their media files. Note that these extended MIME types are not at all official, and likely not used by any other organization. The currently supported MIME types are:
|
| medium | A simple way to distinguish among the different media types. This will be "audio" or "video", as taken from the MIME type. |
| duration | The duration of the media, in seconds. This is supplied only if it exists in the media's metadata. |
| adult | A simple flag to indicate whether the media contains adult material. This will be "1" if we detect adult material. This determination is made based on information contained in the metadata, and on some simple heuristics performed on some metadata fields. Because this field depends on reliable metadata, it cannot block 100% of content that others may find objectionable. |
| keywords | Key words or tags as supplied by the media's author. |
| thumbnail | Link to the thumbnail image, if one is supplied by the media creator. |
| copyright | Copyright information from the media's metadata. If no copyright was supplied in the metadata, this field will not exist. |
Note 1: The mediaUrl field is the URL to the media's authority page. This is the "official" location of the media item, and will often be an HTML page that hosts the video. The content_loc field, on the other hand, always refers to a media file. content_loc is the value that you would pass to a player in order to have it play the media.
This is a rough estimate of the video's quality, based upon the content of the metadata and on the computed reputation of the media's author.
A score of 0.001 typically means that we have no information about the author, and so cannot assign a score with any reliability.
A big stumbling block with media on the Web today is supporting the wide variety of formats on multiple platforms. The number of combinations that you must support is intimidating, but limiting yourself to a single format usually means going with the biggest provider and ignoring the rest. Rather than go that route, we chose to support a wider range of formats. That gives us more sources of video, but makes playing them seamlessly a bit of a problem. To simplify our coding and allow the addition of new video types, we developed the Thebe media player, which will play many different types of audio and video files.
Thebe is a JavaScript wrapper around several different players. The wrapper provides a single unified API that will allow you to play YouTube and Vimeo videos, MP4 and Flash videos from any source, and MP4 and MP3 audio files. New formats such as WebM can be added without changing the programming interface.
Thebe is written as a jQuery plugin, and the standard controls use the jQuery UI library. As a result, the player relies heavily on jQuery. If you do not include the jQuery scripts in your HTML file, the Thebe initialization code will include them.
To play a video with Thebe, include the thebe.js file in your script or HTML file, create a container for the player, and call the thebe function on that jQuery object. In the example below the videoPane div will contain the player. First, the HTML code:
<div id="videoPane" style="height:510px; width:640px"> <!-- Thebe player will be inserted here --> </div>
Then, in your JavaScript:
var videoToPlay = {
"title": "My Devilish Tribute",
"mediaUrl": "http://www.youtube.com/watch?v=42a-moL8NB0",
"score": "0.001",
"author": "YouTube!Tomkaulitzluver52799",
"description": "My Devilish Tribute! My Love comes from the inspiration you given me! Enjoy!",
"pubDate": "Wed, 15 Dec 2010 19:25:25 GMT",
"content_loc": "http://www.youtube.com/watch?v=42a-moL8NB0",
"type": "video/youtube",
"medium": "video",
"duration": "230",
"adult": "0",
"keywords": "Videos, made, by, Tomkaulitzluver52799, and, tokiohotelvevo",
"thumbnail": "http://i.ytimg.com/vi/42a-moL8NB0/3.jpg",
"category": "Entertainment"
};
$("#videoPane").thebe(videoToPlay);
The video parameter to Thebe is an object that contains the video information. The Firehose provides information in that format.
Video objects must always have a mediaUrl field and a type field. Many video types also require content_loc and thumbnail_loc fields. Currently, all other fields are optional. The best results are obtained by passing information directly from the Firehose to the player.
The thebe function accepts a second parameter with configuration options. Those options describe how you want the player to start. The supported options are:
| Thebe options | |
|---|---|
| Option | Description |
| autoplay | boolean - Set to true if you want the video to begin playing immediately. |
| preload | boolean - Set to true to begin loading video data immediately. Do not use witd autoplay. |
| controls | boolean - Set to true to show thebe's built-in video controls. |
| volume | number - the initial volume, expressed as a number between 0.0 and 1.0. |
| muted | boolean - set to true if you want the player to start with the volume muted. |
| start | number - The video starting position, in seconds. |
| quality | An integer between 0 and 10 indicating the relative quality desired. Lower numbers give a lower-quality video. Although this option is accepted for all videos, it will only change behavior if multiple versions of the video are available. |
So, if you want to start a video playing immediately, with the volume at 50%, you would write:
$("#videoPane").thebe(videoToPlay, { autoplay:true, volume:0.5 });
thebe() is the only method added to the jQuery object. You can get information about the current playback state, set state, or control the player by passing commands to the thebe() function. The commands available are:
| thebe commands | |
|---|---|
| Command | Description |
| thebe("paused", true) thebe("paused", false) |
Pause a video that's playing, or unpause a video been paused or merely queued. |
| thebe("position") thebe("position", pos) |
Get or set the time index in the current video. |
| thebe("muted", true) thebe("muted", false) |
Mute or un-mute the audio on the current video. |
| thebe("volume") thebe("volume", vol) |
Get or set the volume for the current video. Note that, although "volume" and "mute" are distinct commands, setting the volume will un-mute the player's audio. |
| thebe("buffered") | Get the loaded portion of the video as an array of two numbers, which are seconds from the beginning. The first number, result[0], is the starting position of the loaded portion, and the second, result[1] is the ending position of the loaded portion. |
thebe("unrender") | Stop the video and remove it from the DOM. The current version of thebe requires you to call this in order to remove the video. In the future, thebe will support jQuery's remove() method. However, calling remove in the current version will leak resources.
|
Assuming that you called thebe to start a video as shown in the example above, then you would set the volume by writing:
$("#videoPane").thebe("volume", 0.5);
Thebe raises a number of different custom events on the containing element. These events describe changes in the state of video playback. Events are raised whenever state changes, however the change occurred: programatically, by operating thebe's controls, or by operating native controls. You subscribe to those events by binding a function to the event name. The events that thebe will raise are:
| thebe events | |
|---|---|
| Event | Description |
| play | The play event is raised whenever playback commences, either because a video has just started playing after loading, or because the video has been unpaused. After playback has commenced, the paused property of a thebe instance will be false. |
| pause | The pause event is raised whenever video playback becomes possible but is not commenced, or when playback ceases but can be continued immediately. In the described state and after this event, the paused property of a thebe instance will be true. |
| embed | The embed event is raised whenever a player implementation has determined that it is ready and working. After this event, any calls to thebe that are supported by the player implementation will have immediate effect. If a video fails to play after this event is raised, it is likely related to the particular video and not the the platform. For instance, it is not because Adobe Flash or Apple QuickTime is not installed. |
| metadata | The metadata event is raised when the player has received data from the content source specific to the video about to be played. Other than providing a duration when available, thebe does not provide robust access to video metadata. This event is generally useful only when exploiting specific knowledge of player implementations. |
| seeking | The seeking event is raised when the player is seeking to a new position in the video, usually as a result of a position command or because the user moved ahead in the video. This event will not be raised during normal playback. |
| seeked | The seeked event is raised after the playback position has been changed in response to a position command or user interaction. This event signals that playback is now continuing or, in the case of a paused video, that it is possible to resume playback. This event will not be raised during normal playback. |
| volumechange | The volumechange event is raised whenever the player's volume is changed. |
| mute | The mute event is raised whenever the player's audio is muted. |
| unmute | The unmute event is raised whenever the player's audio is un-muted. |
| error | The
|
Thebe uses multiple player implementations, not all of which support all functionality. Although the API is consistent for functions supported, not all devices and content types can deliver all events. For example, Thebe is unable to detect when a YouTube video playing on the iPad or iPhone has ended. As a result, you won't get an "ended" event for those videos on that platform. An API for determining the capabilities supported for a particular video on a particular platform is under development.
Subscribing to events is a simple matter of binding a function to the event name. For example, to subscribe to the "ended" event, you would write:
$("#videoPane").bind({ ended:videoEnded });
function videoEnded(event, misc)
{
// handle event here
}
Or, you can supply the function in-line, like this:
$("#videoPane").bind({ ended:function(event, misc) { // handle event here });
All events pass a JavaScript event object as the first parameter, and a second miscellaneous object. With the exception of the error event, the fields in the miscellaneous object are not documented and you should not depend on any values there being useful to you. If you do not need the event object, you can write your function to take no parameters, as in function videoEnded().
JavaScript applications that access the Firehose API and use the supplied data face certain challenges, chiefly among them polling the API for updates, and presenting a consistent video player user interface for all of the different media types. That's in addition to the well known difficulties of working with the DOM. To make things easier on ourselves (after all, we use the data, too) and outside developers, we have developed some JavaScript libraries and made use of some publicly-available libraries.
The standard functionality for JavaScript applications that interact with the Firehose API is encapsulated in a JavaScript file called firehose.js, which must be included in any page that accesses the firehose. firehose.js contains a class that, when called, sets up a polling loop for the API and notifies the client whenever new data is downloaded. We strongly recommend that you use the supplied object as outlined in the sample programs.
The other support library is, of course, the Thebe Media player.
See the sample applications to learn how these libraries are incorporated into Web pages.
We have created the Videoroulette and Videodeck sample applications in order to show what is possible with the Firehose API, and also as examples that show how to build interactive JavaScript applications that use the data and the Thebe media player. These applications are very similar and share much code. They are the "reference implementations" for JavaScript applications that use the Firehose API.
Videoroulette illustrates the sheer volume of new video that is being uploaded to the Internet every minute, and allows you to view videos as they're found.
Videodeck lets you enter query terms and display only those videos that match the query terms. The interface allows you to filter videos by a computed quality score, and of course you can click on any matching video in order to view it in the integrated player.
Each application consists of a single HTML file with inline CSS, and a supporting JavaScript file that provides all of the application logic. In addition, these applications rely on the JavaScript libraries that query the Firehose API and provide the universal video/audio player.
You are free to download the applications and study them to see how they work. Right-click below to download any of the files.
To duplicate one of the sample applications, download the index.html and corresponding JavaScript file above. Place those files in a directory on your Web server. You will also need to place the default video and audio images in that directory. With those files in place, you should be good to go.