Visualize Real Time Twitter Trends With Kibana

Kibana offers a simple way to visualize logs/data coming from Logstash. With Twitter as one of the standard inputs of Logstash, it is very easy to visualize Twitter trends in Kibana. In this post, I would be demonstrating how you can get Twitter data for few search terms, pass it through Elasticsearch to get indexes and customize Kibana dashboard to get real time changes in the tweets. I’m using Ubuntu 14.04 on AWS. If you haven’t used AWS before, you might want to look at my previous post on getting started with AWS. Let’s have a look at what we are gonna get at the end of this tutorial -

I assume that you have instantiated an EC2 instance (Ubuntu 14.04) and logged into it. Combination of Logstash, Elasticsearch and KIbana is so popular to track and manage logs that it is termed as ELK stack. As this tutorial is for learning purpose, we would be setting up everything on a single machine. Let’s get started -

    1. Install Java -
      sudo add-apt-repository -y ppa:webupd8team/java
      sudo apt-get update
      sudo apt-get -y install oracle-java7-installer
      
    2. Install Elasticsearch -
      wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
      echo 'deb http://packages.elasticsearch.org/elasticsearch/1.1/debian stable main' | sudo tee /etc/apt/sources.list.d/elasticsearch.list
      sudo apt-get update
      sudo apt-get -y install elasticsearch=1.1.1
      
    3. Edit Elasticsearch config file – Open file “/etc/elasticsearch/elasticsearch.yml” and line “script.disable_dynamic: true” at the end of file. Search for line with “network.host:” and change it to “network.host: localhost”. Now, start elasticsearch using -
      sudo service elasticsearch restart
    4. Install Kibana -
      cd ~; wget https://download.elasticsearch.org/kibana/kibana/kibana-3.0.1.tar.gz
      tar xvf kibana-3.0.1.tar.gz
      
    5. Edit Kibana config file “kibana-3.0.1/config.js” and change line with keyword “elasticsearch” to “elasticsearch: “http://”+window.location.hostname+”:80″,”. Now move files to proper location -
      sudo mkdir -p /var/www/kibana3
      sudo cp -R kibana-3.0.1/* /var/www/kibana3/
      
    6. Install nginx -
      sudo apt-get install nginx
      cd ~; wget https://gist.githubusercontent.com/thisismitch/2205786838a6a5d61f55/raw/f91e06198a7c455925f6e3099e3ea7c186d0b263/nginx.conf
      
    7. Now edit this config file (nginx.conf) and change “server_name” value to the Elastic IP of the node and “root” to “root /var/www/kibana3;”. Now, copy the file to the right location. Provide a username in place of <username> below (and proper password, when asked) -
      sudo cp nginx.conf /etc/nginx/sites-available/default
      sudo apt-get install apache2-utils
      sudo htpasswd -c /etc/nginx/conf.d/kibana.myhost.org.htpasswd <username>
      sudo service nginx restart
      
    8. Install Logstash -
      echo 'deb http://packages.elasticsearch.org/logstash/1.4/debian stable main' | sudo tee /etc/apt/sources.list.d/logstash.list
      sudo apt-get update
      sudo apt-get install logstash=1.4.2-1-2c0f5a1
      
    9. Let’s configure Logstash now. Create a file “logstash.conf” in home directory and put the following content in it. “term1″ (eg. modi) is any term you want to search in tweets, “term2″ (eg. “obama”) is any other term that that you want to compare the results with. “tweets1″ (eg. “moditweets”) and “tweets2″ (eg. “obamatweets”) can be anything, you can give some meaningful names to it. We would refer to both the kinds of tweets using these name in Kibana. Other values of “consumer_key”, “consumer_secret”, “oauth_token” and “oauth_token_secret” should be taken from a Twitter app which you need to create using the Twitter developer account -
      input {
       twitter {
       consumer_key => "<proper_value>"
       consumer_secret => "<proper_value>"
       keywords => ["<term1>"]
       oauth_token => "<proper_value>"
       oauth_token_secret => "<proper_value>"
       type => "tweets1"
      }
      
      twitter {
      consumer_key => "<proper_value>"
      consumer_secret => "<proper_value>"
      keywords => ["term2"]
      oauth_token => "<proper_value>"
      oauth_token_secret => "<proper_value>"
      type => "tweets2"
      
      }
      }
      
      output {
        elasticsearch { host => localhost }
        stdout { codec => rubydebug }
      }
      
    10. Once this is done, you should see Kibana dashboard if you point your browser to the Elastic IP address of the EC2 node.
    11. Configuring Kibana – To visualize the tweets in real time as shown in the video we need to make few changes -
      1. Add two queries in the “QUERY” section. Click + to add another query. Enter “tweets1″ in one query and “tweets2″ in another query.
      2. In the top right corner, click on “Configure dashboard”. Click on Timepicker and change “Relative time options” to “1m” and “Auto-refresh options” to “1s”.
      3. Now go to “FILTERING” and close all filterings. In the top right section, there is one drop down to select time filtering. Click it and select “Last 1m”. Click it again and select “Auto-Refresh -> Every 1s”.
      4. You can configure the main graph area the way you want. For eg. I converted bars to lines. There are many options you can change and select the one best suited to your needs.
    12.  Now is the time to see the magic. Let’s start Logstash and wait for sometime. We would start observing the trends for the two kinds of tweets in Kibana dashboard. To start Logstash, run -
      sudo /opt/logstash/bin/logstash -f ~/logstash.conf
      

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Quick Guide to Using Python’s Virtualenv

This is a quick guide to knowing and using Python’s virtualenv which is an awesome utility to manage multiple isolated Python environments.

Download-Python-3.3.3-Full-Version

Why virtualenv is needed?

Case 1: If we have a Python application A which uses version 1.0 of package X, but is incompatible with higher versions of it. On the other hand we have an application B which uses version 1.1 of package X, but is incompatible with lower versions of it. We want to use both the applications on the same machine.

Case 2: There is a package Y, which is released in beta version recently and we want to try it out, but don’t want to install it in the global site-packages directory as it might break something.

In above two cases, virtualenv is very helpful. In first case we can create two virtualenvs and then install version 1.0 of package X in one and version 1.1 of package X in another. Two virtualenvs would be isolated from each, hence we are not overwriting each other while installing different versions. In second case, we can create a virtualenv and install package Y in that, which would install Y in site-packages directory of the virtualenv instead of the global site-packages.

How to install virtualenv?

virtualenv can be installed directly using pip.

pip install virtualenv

How to use virtualenv?

First create a virtualenv using –

virtualenv env1

“env1″ is your environment name, change it accordingly. This would create a vritualenv named env1. Now wherever you run this command, it would create a directory env1 there. To activate and start using this newly created virtualenv, go to env1/bin and run -

source activate

Now you are inside the virtualenv. Any package that you install now, would get installed to env1/lib/pythonx.x/site-packages, instead of global site-packages. Any package that you install now doesn’t affect your global packages and other virtualenvs. To exit this virtualenv, run -

deactivate

To remove a virtualenv, simply delete the corresponding directory.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Basic Git Commands for Open Source Contribution

In my last post I discussed about how one can get started with contribution to open source. You need to know some basic Git commands to work flawlessly with open source projects.

git_logo

Let’s discuss them one by one by taking an example project, Titanic -

  1. Set up Git as given here. This is a pretty quick tutorial on setting up Git.
  2. You would need to fork the repository on which you want to work, in this case it is Titanic. Forking would bring that project’s code to our workspace and hence allows us to make changes to it. Login to Github, go to the repository you want to fork and click on “Fork” on the upper right hand side of the page. Once forked, you can see the repository in your Github profile and below that you would see “forked from <orginal_repository>”.
  3. Now we need to copy this code to our machine so that we can make changes to it. For that we would “clone” the forked repository. To clone the repository to your machine, run the following in any directory in your machine -
    git clone https://github.com/theharshest/titanic
  4. We have whole source code of this project now. This is the time to make changes to the file, i.e. we fix the bug in the concerned file. After you are done making changes to the file/files, we need to “stage” the files. For that, do the following from the repository directory -
    git add --all
  5. At any point of time you can run git status to see the status of our work in git. Now, we need to “commit” the changes we just made using -
    git commit -m "Adding a bugfix"
  6. After committing the changes, we would push the changes to our forked Github repository. To do that, run -
    git push origin master
  7. Now our forked repository has the changes, but the main repository from where we forked our repository doesn’t know anything about the changes we made. We, of course, can’t make changes to it directly, as we are not the owner of the repository. We would request the owner of the repository to look at the changes we made and if he feels the bugfix we made is correct, he can approve and merge those changes to the main repository. To achieve this, we would do something called as “pull request” -
    1. Go to your forked repository and in the right sidebar, click on “Pull Requests”.
    2. Now click on “New Pull Request”, provide a description and create the pull request.
  8. After this, the author of original repository would see your pull request and if he thinks that you’ve made the correct changes, he would go ahead and merge and close the pull request. Now, the main repository has the changes that you’ve made.

This is a very high level talk on Git which doesn’t go into the meaning of commands. I would suggest you to watch the following screencast to get hold of Git basics -

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Getting Started with Open Source Contribution, The Easy Way!

I recently made my first open source contribution and would like to share my experience here. This post would also serve as a tutorial for others who want to contribute to open source but couldn’t find the right direction to start. I made a bug fix to the Mozilla project called Titanic. You can see my username (theharshest) in the contributors list of the project. Let me put a step-by-step guideline on how I did that -

220px-Opensource.svg_

1) To start, you need to first create a Bugzilla account.

2) Now, let’s search for some easy bugs relevant to our interest. For that go to Bugs Ahoy.

3) Scroll down and on the left side under “Display only” select both the options. “Bugs with no owner” to make sure the bug that we pick up is not assigned to someone else and “Simple bugs” to make sure that we pick the bugs tagged with [good first bug] tag, which are easy bugs.

4) Now in the “Do you know” section, select the languages you are good in. Now wait for some time and the list of bugs as per the selected filters would get populated.

5) Select a bug from the list and log in with Bugzilla account created in Step 1. You would see a field “Mentors” in the bug details. This person would be your point of contact while you work on the bugs.

6) Now, open a IRC client, like XChat and connect to Mozilla network as per the instructions given here. Go to #ateam (or the channel corresponding to the team whose product’s bug you are working on) channel. Find your mentor here and start talking to him regarding everything you need to fix that bug.

This is the basic workflow that should be followed. You need to have basic Git skills to get it done. My next post covers the basic Git commands you would need to accomplish this task.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Getting Started With Amazon Web Services – Part 2

aws

In Part 1 of this series, I showed you how to create a user and provide him privileges to do stuff in AWS. Now we move ahead and try creating EC2 instances. To create EC2 instances, first we need to have a “Key Pair” which would enable us to ssh into the EC2 instance. To create a Key Pair -

1) Go to AWS console -> EC2 -> Key Pairs -> Create Key Pair.

2) Provide a name, create Key Pair and download the pem file.

3) Change the permissions of pem file to 400.

Now, create a security group -

1) Go to AWS console -> EC2 -> Security Groups -> Create Security Group.

2) Provide a name and description.

3) Click on Add Rule. In Type, select SSH. In Source, select My IP if you have a static IP, else select Anywhere. Click Create.

Now, let’s create the EC2 instance -

1) Go to AWS console -> EC2 -> Instances -> Launch Instance.

2) Select “Ubuntu Server 14.04 LTS”. Keep the default settings till Step 4.

3) In Step 5, provide a name to the instance, so that we can identify this instance from the list of EC2 instances in the console.

4) In Step 6, choose “Select an existing security group” and select the security group created above. Click Review and Launch.

5) Now click Launch and select the Key Pair which we have created before.

Now go to AWS console -> EC2 -> Instances. Select the instance which you just created and copy its Public IP from the panel below.

Assuming that you have followed everything from Part 1 of this tutorial series, go to command line and do -

ssh -i <path_to_pem_file> ubuntu@<public_ip_of_ec2_instance>

You should be logged into the EC2 instance.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Getting Started With Amazon Web Services – Part 1

aws

AWS offers sufficient free stuff to try which makes it one of the best cloud service provider for personal use. Today I’m gonna show you how to get started with AWS quickly in few easy steps -

1) Register for AWS as shown here.

2) Now, go to AWS console and create a IAM user by going to Services -> IAM -> Users -> Create New Users.

3) Provide a username and create a user. Go ahead and click “Download Credentials” and save the file at a secure place. Now, the user is created but it doesn’t have any permissions.

4) In IAM, click on Groups -> Create New Group. Provide a group name and in policy templates select “Administrator Access”. Go ahead and complete the group creation.

5) Now again go to Users in IAM and select the user we created before. Click on User Actions -> Add User to Groups. Now select the group we created in the previous step. Now, this user has administrator privileges.

6) Now, we would be using Python SDK for AWS to manage AWS from command line. To do that clone a sample project using -

git clone https://github.com/awslabs/aws-python-sample.git

7) Install the SDK -

pip install boto

8) Now create a file “~/.aws/credentials” and put the following content in it, replacing the values properly from credentials file downloaded in step 3 -

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

9) Now go to the clone directory (from step 6) and run the following -

python s3_sample.py

10) If everything goes fine, above command would create a S3 bucket and an object and you would see the results as output. If you get any error related to credentials, you should try setting the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

Move ahead with Part 2 of AWS tutorial.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

How to Install Python Package When You Don’t Have Root Access?

Many times we don’t have machines with root access. This mostly happens with machines that are shared between several users like in a university.

access_denied

Pip is the easiest way to install Python packages. By default it installs packages to “/usr/local/” which needs root privileges to access. You can override this default installation directly easily by using the following -

pip install <package_name> –user

This would install the package in “/home/<user_name>/.local/lib/python2.7/site-packages” (2.7 would be the version number, can be different in your case) instead of “/usr/local/”. Now to use this package in a Python script you can’t use “import <package_name>” without telling Python the location of directory where this package is installed. To make sure this new path is included, use the following in the Python script -

import sys
sys.path.append(‘/home/<user_name>/.local/lib/python2.7/site-packages’)
import <package_name>

Python should accept this package now.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Using Unicode Properly while using MySql with Python

I was facing an issue related to character set while downloading Facebook posts’ data. I was getting the data using Facebook Graph API and dumping it in a mysql database.

mysql-databases

When I try to insert a post content directly into the database, I was getting -

sql = sql.encode(self.encoding)
UnicodeEncodeError: ‘latin-1′ codec can’t encode characters in position

I tried encoding the content to unicode explicitly but then the few characters were malformed in the database. I already changed the default character set of mysql database to ‘utf-8′. When I was trying to use encode/decode, I was getting errors like -

return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xd1′

To resolve this first I changed the default character set (ASCII) in the Python script to Unicode as follows -

import sys reload(sys) sys.setdefaultencoding(utf-8)

Then, I made sure that while connecting to database, I set parameters ‘use_unicode’ and ‘charset’ properly as follows -

conn = pymysql.connect(host=xx, user=xx, passwd=xx, db=xx, use_unicode=True, charset=utf8)

After doing this, I could see the special characters properly in the database.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

How to get likes count on posts and comments from Facebook Graph API

facebook-likes

Though Facebook has mentioned in their Graph API documentation that we can get the count of like on a post as follows -

GET /{object-id}/likes HTTP/1.1

with total_count as a field, but it doesn’t work as expected. Instead it returns the list of  users (name and their id) in a page-wise  fashion. So to get the total number of likes you need to traverse it page-wise and look at the number of users who liked it as follows -

def get_likes_count(id, graph): count = 0 feed = graph.get_object(/+ id +/likes, limit =99) whiledatain feed and len(feed['data'])>0: count = count + len(feed['data']) ifpagingin feed andnextin feed['paging']: url = feed['paging']['next'] id = url.split(graph.facebook.com)[1].split(/likes)[0] after_val = feed['paging']['cursors']['after'] feed = graph.get_object(id +/likes, limit =99, after=after_val) else: breakreturn count

I have used this in a Python script which uses Facebook Python SDK

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Movie Rating Prediction Using Twitter’s Public Feed and Naive Bayes Classifier

With thousands of tweets being posted around the time period a movie is released, we can certainly find out the sentiments of people watching it from those tweets. So why not try to predict the movie rating from those tweets from sentiment analysis of tweets. Here I’m going to use Naive Bayes Classifier from Python’s NLTK library to classify tweets as positive (if people like the movie) or negative (if they hated it).

5-Star-Rating

For training the classifier I used Stanford AI lab’s “Large Movie Review Dataset“. If you open this database, you would see two directories, marked as negative and positive having movie reviews, one review per file. Now, I trained the classifier using this dataset and saved it using pickle. Once the classifier is trained, I loaded the pickled classifier in my main Python file to use it.

One of the major step in training classifier is feature extraction. I did a simple feature extraction here. First I converted every word in the document to lower case, then removed all stop words which I grabbed from NLTK’s corpus for English words. Naive Bayes classifier in NLTK expected features as input in the form of a dictionary. So I used the following function to get the desired result:

def get_features(tweet): global stop words = [w for w in tweet if w notin stop] f = {} for word in words: f[word] = word return f

Classifier takes two parameters, one is the function which extract features from a given text (as described above) and training set. Training set should be labelled, so we take reviews from the directory marked positive, extracted features from it, make a list out of the features and then make a tuple from that list and review label. We do the same for directory marked negative.

Now, once we are done training the classifier, we grab 100 latest public tweets using Tweepy as per user’s query, extract features the same way as we did before and feed it to the trained classifier.

def get_tweets(movie): auth = tweepy.OAuthHandler(ckey, csecret) auth.set_access_token(atoken, asecret) api = tweepy.API(auth) tweets = [] i=0 for tweet in tweepy.Cursor(api.search, q=movie.lower(), count=100, result_type=recent, include_entities=True, lang=en).items(): tweets.append(tweet.text) i+=1if i==100: breakreturn tweets

Classifier would mark it as positive or negative. Now we can predict movie rating based on the ratio of positive tweets  in the 100 tweets extracted.  You can see the full code on my Github profile which should be pretty explanatory now.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS