How To Remove Recommended Videos from YouTube

If you got to YouTube and login, it would take to your homepage. In your homepage you would see three tabs at the top – “What to Watch”, “My Subscriptions” and “Music” and below that you would see a bunch of recommended videos. Most of the time I find these recommendations annoying and irrelevant. I want my homepage to be devoid of these recommendations and just the videos from my feeds should appear. Let’s see how to do that.

youtube_logo

Recommended videos are generated based on your watch history which stores all the videos that you have watched before. This is enabled by default. If we disable it and erase the existing history, YouTube would not show any recommendations. Here’s how to do it:

  1. Go to YouTube homepage and click on History on left sidebar.
  2. Under “Watch History” tab, click “Pause watch history” to disable it.
  3. Now, click on “Clear all watch history”. This would erase the existing history.

Navigate to some other page and come back to it to see the effect. Now you are good to go.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Amazon Machine Learning for Kaggle

Amazon recently released Machine Learning as a service on AWS. Here I make an attempt to use it to make a submission to Kaggle. I’m taking Titanic: Machine Learning from Disaster problem from Kaggle. Read the problem statement and download the train and test data from Kaggle.

Image Courtesy of Amazon

Image Courtesy of Amazon

Goal is to get a sufficiently good model as quickly as possible. If you are not familiar with AWS or have not setup your account yet, you might want to look at my previous post on AWS. Let’s get started:

Uploading data:

  1. Login to AWS console and go to S3.
  2. Click on “Create Bucket” and name it something and click on Create.
  3. Now, in the bucket list, click on the new created bucket, then Actions -> Upload -> Add Files and select train.csv that you downloaded from Kaggle. Click Start Upload.
  4. Do the same with test.csv.

Creating datasource:

  1. If it says that Machine Learning is not available in your region, select the listed region and click on Get Started. Then click on Launch against Standard Setup.
  2. Enter the path of train.csv in “S3 location” field. It should be <bucket_name>/<filename>. Give a datasource name.Click verify. If it asks for read permission, then say yes and proceed. Click Continue.
  3. Just give a glance to see if all field types are correctly identified. You can play with different combinations which you feel correct, I changed a few data types here. Now, click Yes on “Does the first line in your CSV contain the column names?“.
  4. In Target step select field with name “Survived” and click Continue -> Review -> Continue.

Machine Learning Model:

  1. After the above step, you would be taken to ML Model Settings. Click Review. Let it come up with a default recipe. Click Finish.
  2. It would take several minutes to complete, have patience. Click on ML models at the top of AWS console window. Click on the model name from the list. Click Generate Batch Predictions.
  3. Select “My data is in S3, and I need to create a datasource”.  Put S3 location of test file same as before and check “Does the first line in your CSV contain the column names?”. Click verify and give read permission.
  4. Move ahead, give a filename in S3 bucket and click Verify and give write permissions. Click Finish.

Download the generated prediction file .csv.gz from the S3 bucket. Adjust the file to get it into proper submission format by adding one column of ID from text.csv and putting proper headers. Now submit it on Kaggle. I got a score of 0.77033 on the leaderboard, which is certainly not very good. You can improve it by setting few parameters in evaluation of ML Model in AWS and choosing a different evaluation strategy .

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Remove Black Box Behind Cairo Dock In Lubuntu

If you are using Cairo Dock in Lubuntu, then you would face this problem just after installing it. If you look at most of the solutions present online like the one given here, they would work fine for Ubuntu but not for Lubuntu. The reason is that Lubuntu does not have any composite manager installed by default, hence you won’t even see the option of “Composition” in Cairo Dock’s system settings.

To keep things simple and to maintain the lightweightness of Lubuntu let’s install xcompmgr. As per its home page: Xcompmgr is a simple composite manager capable of rendering drop shadows and, with the use of the transset utility, primitive window transparency. Designed solely as a proof-of-concept, Xcompmgr is a lightweight alternative to Compiz and similar composite managers.

Installation:

sudo apt-get install xcompmgr

To enable it on startup:

Add below line to ~/.config/lxsession/Lubuntu/autostart (You should have lxsession-edit installed):

@xcompmgr -c

Restart your system and Cairo Dock should not have the black box behind it now.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Merge And Sync Different Google Calenders

If you are using more than one Google accounts and actively maintain calendars on them separately, then you are definitely looking for this post. This method would not only let you merge both the accounts but also allow you to keep them in sync, so new events added in one calendar get reflected in another calendar.

merge_google_calenders

Issue: Suppose you have two Google accounts: 1 and 2. You want to merge account 1’s calender to account 2, so that account 2’s calendar would have events from both the accounts.

Solution: Just open the Google calender of account 1, you would see “My calenders” in the left sidebar. Click on the small down arrow to the right of it, then click “Settings”. You would see a table of calenders. Click on “Shared: Edit settings” in that table for the calendar that you want to want to merge. Now enter the email address of account 2 and click on “Add Person”. Then save it. Now, you would receive an email in account 2 saying that account 1 shared its calender and if you open the calender of account 2, you would see all the events of account 1. Make any change in account 1’s calender and you could see it in account 2’s calender as well.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Switch Between Shell and Applications on Terminal

Several times we face the issue of switching from a running application to the shell without quitting it and then resuming the work on the application once we are done with shell. This is the case when the application is running in the terminal. For eg. we executed a Python script and it resulted in some error, then we open the Python file in the same terminal using VIM. Now, we want to review the error but without closing the VIM (as you have not committed the changes yet, and aren’t sure if you are ready to do that). Another similar situation: Suppose we are running the Python shell and have done few things here. Now, we want to switch to BASH (to run a shell command) without closing the Python shell and resume our work on Python shell once we are done with BASH. These are just few of the many cases where we need to alternate between our current application on terminal and the Linux shell.

linux_tips_harsh_tech_talkThe solution to this problem is very straight and simple. While you are working on the application, just press “Ctrl Z” which would suspend the process but won’t kill. You would get a message saying “suspended” at the shell prompt. Now, do what you want to do on shell and once you are done, just type “fg” and you would get back to your previously suspended application. “fg” stands for foreground, which literally brings the suspended process to foreground.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Get List of Most Informative Features In Scikit Learn

Whenever you are doing a classification task, you should always look at some of the most informative features after training the classifier. This would give you an idea that you are on the right track. For eg, you have a sentiment classification task and you see some of the most informative features as “shocked”, “unhappy”, “depressed”, etc. in the positive sentiment category, then it is an indicator that you have made a big blunder somewhere. Apart from this most informative features provide an insight into the data that you are trying to run your classification tasks on.

Let’s get started and see how to get the most informative features in scikit learn. Firstly, you should have a mapping of feature names to the feature vectors. If you are using any of the standard vectorizer in scikit learn (like sklearn.feature_extraction.text.CountVectorizer) then getting the feature names is straightforward and can be done as follows –

count_vect = CountVectorizer()
feature_names = count_vect.get_feature_names()

Once you get the feature names, you need to look for the “coef_” attribute of the classifier that you are using (remember that all classifiers don’t have this attribute). If you look at LinearSVC‘s documentation then you can see it in the attributes list. It would be a vector of length “m” if there are two classes, else a matrix of size “nxm”, if there are more than two classes, where m is the total number of features and n is the number of classes. Now, if we want to see the top 20 most discriminative features of class 3, then we would just sort the corresponding row and extract the feature names of top 20 features as follows –

# Initialize the classifier
clf_SVM = LinearSVC(C=1, tol=0.001)

# Fit the dataset, where X_vect comes from passing the dataset
# through a vectorizer/transformer like CountVectorizer()
clf = clf_SVM.fit(X_vect, y)

# Sort the coef_ as per feature weights and select largest 20 of them
# 2 shows that we are considering the third class
inds = np.argsort(clf.coef_[2, :])[-20:]

# Now, just iterate over all these indices and get the corresponding
# feature names
for i in inds:
    print feature_names[i]

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Model Hyperparameter Tuning In Scikit Learn Using GridSearch

Getting the right hyperparameters for a machine learning model is an essential task for getting the best generalization performance. Hyperparameters provides regularization to the model and hence prevents it from overfitting. Grid search basically does a exhaustive search on the set of hyperparameters’ values provided, finding the best combination of values provided using cross validation or some other evaluation method and a scoring function.

Let us see how to do parameter optimization in Scikit Learn using GridSearch –

clf_LR = Pipeline([('chi2', SelectKBest(chi2)), ('lr', LogisticRegression())])
params = {
          'chi2__k': [800, 1000, 1200, 1400, 1600, 1800, 2000],
          'lr__C': [0.0001, 0.001, 0.01, 0.5, 1, 10, 100, 1000],
          'lr__class_weight': [None, 'auto'],
          'lr__tol': [1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7]
          }
gs = GridSearchCV(clf_LR, params, cv=5, scoring='f1')
gs.fit(X, y)

Here, I’m taking the classifier as Logistic Regression and feature selection method as chi2. clf_LR represents the standard pipeline, where my data (represented as set of feature vectors) first goes through a feature selection process (chi2) and then through the classifier (LogisticRegression).

params is where I’ve defined all the values of parameters that I want to try out. If I don’t use any parameter here, its default value would be considered. It is represented as a dictionary, where keys represent the hyperparameter name (defined as <model_name>__<hyperparamater_name>) and values represent the list of values that we want grid search to try. For keys, “model_name” comes from the Pipeline, it is followed by two underscores and then by the hyperparameter name. Hyperparameter name should be taken from the scikit learn’s documentation, from the model’s page, listed under section “Parameters:”, like here. Remember that few values of hyperparameters are dependent on each other, so if you fix two hyperparameters and try to vary the third which is dependent on the first two, then the third hyperparameter can only take limited values. Trying to access the values outside such limit would raise error, which is perfectly fine. In that case, just change the values appropriately.

Lastly, you see how I called the GridSearch. cv defines the number of cross-validations to use, in this case I’m doing 5 fold cross validation. scoring defines the scoring function to optimize. You can see the different scoring values possible, here. Now, just fit the dataset with the labels. X is my sample feature matrix and y is the corresponding labels.

Now, we can get the best combination of parameters from the set of parameters provided above based on the scoring and evaluation criteria –

print gs.best_estimator_.get_params()

We can also get the best score (here, F1 measure) for these combination of parameters –

print gs.best_score_

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Visualize Real Time Twitter Trends With Kibana

Kibana offers a simple way to visualize logs/data coming from Logstash. With Twitter as one of the standard inputs of Logstash, it is very easy to visualize Twitter trends in Kibana. In this post, I would be demonstrating how you can get Twitter data for few search terms, pass it through Elasticsearch to get indexes and customize Kibana dashboard to get real time changes in the tweets. I’m using Ubuntu 14.04 on AWS. If you haven’t used AWS before, you might want to look at my previous post on getting started with AWS. Let’s have a look at what we are gonna get at the end of this tutorial –

I assume that you have instantiated an EC2 instance (Ubuntu 14.04) and logged into it. Combination of Logstash, Elasticsearch and KIbana is so popular to track and manage logs that it is termed as ELK stack. As this tutorial is for learning purpose, we would be setting up everything on a single machine. Let’s get started –

    1. Install Java –
      sudo add-apt-repository -y ppa:webupd8team/java
      sudo apt-get update
      sudo apt-get -y install oracle-java7-installer
      
    2. Install Elasticsearch –
      wget -O - http://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
      echo 'deb http://packages.elasticsearch.org/elasticsearch/1.1/debian stable main' | sudo tee /etc/apt/sources.list.d/elasticsearch.list
      sudo apt-get update
      sudo apt-get -y install elasticsearch=1.1.1
      
    3. Edit Elasticsearch config file – Open file “/etc/elasticsearch/elasticsearch.yml” and line “script.disable_dynamic: true” at the end of file. Search for line with “network.host:” and change it to “network.host: localhost”. Now, start elasticsearch using –
      sudo service elasticsearch restart
    4. Install Kibana –
      cd ~; wget https://download.elasticsearch.org/kibana/kibana/kibana-3.0.1.tar.gz
      tar xvf kibana-3.0.1.tar.gz
      
    5. Edit Kibana config file “kibana-3.0.1/config.js” and change line with keyword “elasticsearch” to “elasticsearch: “http://”+window.location.hostname+”:80″,”. Now move files to proper location –
      sudo mkdir -p /var/www/kibana3
      sudo cp -R kibana-3.0.1/* /var/www/kibana3/
      
    6. Install nginx –
      sudo apt-get install nginx
      cd ~; wget https://gist.githubusercontent.com/thisismitch/2205786838a6a5d61f55/raw/f91e06198a7c455925f6e3099e3ea7c186d0b263/nginx.conf
      
    7. Now edit this config file (nginx.conf) and change “server_name” value to the Elastic IP of the node and “root” to “root /var/www/kibana3;”. Now, copy the file to the right location. Provide a username in place of <username> below (and proper password, when asked) –
      sudo cp nginx.conf /etc/nginx/sites-available/default
      sudo apt-get install apache2-utils
      sudo htpasswd -c /etc/nginx/conf.d/kibana.myhost.org.htpasswd <username>
      sudo service nginx restart
      
    8. Install Logstash –
      echo 'deb http://packages.elasticsearch.org/logstash/1.4/debian stable main' | sudo tee /etc/apt/sources.list.d/logstash.list
      sudo apt-get update
      sudo apt-get install logstash=1.4.2-1-2c0f5a1
      
    9. Let’s configure Logstash now. Create a file “logstash.conf” in home directory and put the following content in it. “term1″ (eg. modi) is any term you want to search in tweets, “term2″ (eg. “obama”) is any other term that that you want to compare the results with. “tweets1″ (eg. “moditweets”) and “tweets2″ (eg. “obamatweets”) can be anything, you can give some meaningful names to it. We would refer to both the kinds of tweets using these name in Kibana. Other values of “consumer_key”, “consumer_secret”, “oauth_token” and “oauth_token_secret” should be taken from a Twitter app which you need to create using the Twitter developer account
      input {
       twitter {
       consumer_key => "<proper_value>"
       consumer_secret => "<proper_value>"
       keywords => ["<term1>"]
       oauth_token => "<proper_value>"
       oauth_token_secret => "<proper_value>"
       type => "tweets1"
      }
      
      twitter {
      consumer_key => "<proper_value>"
      consumer_secret => "<proper_value>"
      keywords => ["term2"]
      oauth_token => "<proper_value>"
      oauth_token_secret => "<proper_value>"
      type => "tweets2"
      
      }
      }
      
      output {
        elasticsearch { host => localhost }
        stdout { codec => rubydebug }
      }
      
    10. Once this is done, you should see Kibana dashboard if you point your browser to the Elastic IP address of the EC2 node.
    11. Configuring Kibana – To visualize the tweets in real time as shown in the video we need to make few changes –
      1. Add two queries in the “QUERY” section. Click + to add another query. Enter “tweets1″ in one query and “tweets2″ in another query.
      2. In the top right corner, click on “Configure dashboard”. Click on Timepicker and change “Relative time options” to “1m” and “Auto-refresh options” to “1s”.
      3. Now go to “FILTERING” and close all filterings. In the top right section, there is one drop down to select time filtering. Click it and select “Last 1m”. Click it again and select “Auto-Refresh -> Every 1s”.
      4. You can configure the main graph area the way you want. For eg. I converted bars to lines. There are many options you can change and select the one best suited to your needs.
    12.  Now is the time to see the magic. Let’s start Logstash and wait for sometime. We would start observing the trends for the two kinds of tweets in Kibana dashboard. To start Logstash, run –
      sudo /opt/logstash/bin/logstash -f ~/logstash.conf
      

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Quick Guide to Using Python’s Virtualenv

This is a quick guide to knowing and using Python’s virtualenv which is an awesome utility to manage multiple isolated Python environments.

Download-Python-3.3.3-Full-Version

Why virtualenv is needed?

Case 1: If we have a Python application A which uses version 1.0 of package X, but is incompatible with higher versions of it. On the other hand we have an application B which uses version 1.1 of package X, but is incompatible with lower versions of it. We want to use both the applications on the same machine.

Case 2: There is a package Y, which is released in beta version recently and we want to try it out, but don’t want to install it in the global site-packages directory as it might break something.

In above two cases, virtualenv is very helpful. In first case we can create two virtualenvs and then install version 1.0 of package X in one and version 1.1 of package X in another. Two virtualenvs would be isolated from each, hence we are not overwriting each other while installing different versions. In second case, we can create a virtualenv and install package Y in that, which would install Y in site-packages directory of the virtualenv instead of the global site-packages.

How to install virtualenv?

virtualenv can be installed directly using pip.

pip install virtualenv

How to use virtualenv?

First create a virtualenv using –

virtualenv env1

“env1″ is your environment name, change it accordingly. This would create a vritualenv named env1. Now wherever you run this command, it would create a directory env1 there. To activate and start using this newly created virtualenv, go to env1/bin and run –

source activate

Now you are inside the virtualenv. Any package that you install now, would get installed to env1/lib/pythonx.x/site-packages, instead of global site-packages. Any package that you install now doesn’t affect your global packages and other virtualenvs. To exit this virtualenv, run –

deactivate

To remove a virtualenv, simply delete the corresponding directory.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Basic Git Commands for Open Source Contribution

In my last post I discussed about how one can get started with contribution to open source. You need to know some basic Git commands to work flawlessly with open source projects.

git_logo

Let’s discuss them one by one by taking an example project, Titanic

  1. Set up Git as given here. This is a pretty quick tutorial on setting up Git.
  2. You would need to fork the repository on which you want to work, in this case it is Titanic. Forking would bring that project’s code to our workspace and hence allows us to make changes to it. Login to Github, go to the repository you want to fork and click on “Fork” on the upper right hand side of the page. Once forked, you can see the repository in your Github profile and below that you would see “forked from <orginal_repository>”.
  3. Now we need to copy this code to our machine so that we can make changes to it. For that we would “clone” the forked repository. To clone the repository to your machine, run the following in any directory in your machine –
    git clone https://github.com/theharshest/titanic
  4. We have whole source code of this project now. This is the time to make changes to the file, i.e. we fix the bug in the concerned file. After you are done making changes to the file/files, we need to “stage” the files. For that, do the following from the repository directory –
    git add --all
  5. At any point of time you can run git status to see the status of our work in git. Now, we need to “commit” the changes we just made using –
    git commit -m "Adding a bugfix"
  6. After committing the changes, we would push the changes to our forked Github repository. To do that, run –
    git push origin master
  7. Now our forked repository has the changes, but the main repository from where we forked our repository doesn’t know anything about the changes we made. We, of course, can’t make changes to it directly, as we are not the owner of the repository. We would request the owner of the repository to look at the changes we made and if he feels the bugfix we made is correct, he can approve and merge those changes to the main repository. To achieve this, we would do something called as “pull request” –
    1. Go to your forked repository and in the right sidebar, click on “Pull Requests”.
    2. Now click on “New Pull Request”, provide a description and create the pull request.
  8. After this, the author of original repository would see your pull request and if he thinks that you’ve made the correct changes, he would go ahead and merge and close the pull request. Now, the main repository has the changes that you’ve made.

This is a very high level talk on Git which doesn’t go into the meaning of commands. I would suggest you to watch the following screencast to get hold of Git basics –

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS