g0blin Bringing together art & technology

16Aug/108

Reading data from Twitters API stream

Thanks to Twitters Stream API, retrieving large sets of live and real data to analyze in a social context is now very easy to accomplish. At our disposal we have a never-ending stream of Tweets from people all around the globe. In this article you will learn how to tap into this stream using the PHP scripting language.Twitter allows you to stream a few different types of Tweet feeds, from a overall sample of Tweets, to a filtered result set (either by keyword or location). To make things straight-forward, we will be using the plain overall sample stream. In order to receive this stream you will need an active Twitter account.

First things first, we need to be able to continuously read and make use of data from a HTTP stream. While we can read a HTTP stream in its entirety using PHPs CURL library, we need to read this data as it arrives, not as a whole. In order to do this we need to make use of a PHP feature called Stream Wrappers. This feature allows us to define our own functions that deal with incoming file streams (in this case a file stream used as the output by CURL).

To do this we must write a PHP class that contains the functions we'll be using to deal with the incoming data from the Twitter stream. Below is a snippet of code that reads data from the Twitter stream, reads in the lines from the buffer, loops through them and then outputs data from the Tweets.

Now that we have the class we'll be using for our stream wrapper defined, we can go ahead and register it using the following call.

We have now registered our own custom protocol that we can use with PHPs fopen function. You must now open a file handler using this new protocol.

Now that we have an instance of our custom stream handler, we can go ahead and begin to sample Tweets from Twitters API stream using the CURL library. The below snippet sets up a CURL instance, defined the URL as the Twitter JSON stream, and tells CURL to output the incoming data to our custom stream. Note the value of the CURLOPT_TIMEOUT. This is to ensure our connection persists. Note that for this to be effective, you may also need to increase the timeout limit in your php.ini configuration file - usually found in /etc/ for linux.

After you put all of this code together, all being well once you run it you should receive a live stream of Tweets as they come in. Note that without filtering you will receive a massive amount of data.

Under the default access level Twitter provides, you will receive approximately 1% of all the Tweets worldwide - which trust me is quite alot! If you require a larger amount of data you can request 'GardenHose' access, which provides approximates 5% of Tweets worldwide.

This covers receiving and processing Tweets from Twitters stream API in PHP. Next we'll touch on how to use this data to display a real-time stream of a specific range of Tweets using a mixture of JavaScript, PHP and MySQL.

Note: Twitter changed their stream API, to work on HTTPS only. The post has been modified to reflect this.

Comments (8) Trackbacks (2)
  1. Thanks for this. I was looking for an example of how PHP’s string wrappers worked, for another HTTP stream, and this came in handy.

    I think your CMS may have changed some of the characters in the PHP code though. Line #3 does not look correct.

  2. Second note. The PHP code is definitely at the very least HTML escaped. The class operator “dash” “greater than” is definitely being translated into html escape codes.

    Easy to fix now :)

  3. Hey Michael, thanks for your comments. Yeah, you’re right. It does look like my code-highlighter has gone a bit barmy! I’ll look into that tonight. Cheers :)

  4. Hey, do you need to give the option “CURLOPT_HTTPAUTH” when you need to authenticate ?
    By the way, I use your script to another stream server but I didn’t receive message as it comes. But Only when then maximum lenght is reached :/

  5. Sry, not when max lenght is reached but when only on timeout,

  6. Hey florent, as I recall the above code worked as-is. Whether or not Twitter have change their API (you might want to try HTTPS, I do recall them changing something a while ago) is another matter.

    I’ll give this code another go here, and see if I can get it to work.

  7. Ok, that was the problem. As I suspected, they have changed the stream to HTTPS only. The above code will work as-is, with ‘http://’ changed to ‘https://’. I’ve edited the post also.

    Let me know if you have any other issues :)

  8. This is a good solution and I am using it, but there is an issue in that curl seems to buffer the data into 8192 byte buffers and so in periods of low volume traffic this causes latency on the stream. I close the stream periodically using a lower timeout and that flushes the stream, but this has its own issues, as it requires me to run overlapping streams and this in essence turns the streaming into polling. I have tried curl_setopt($ch, URLOPT_BUFFERSIZE, 256); but this does not seem to be honored (the documentation say it can be ignored). Any suggestions?

    Thanks in advance


Leave a comment