import pandas as pd from functools import reduce Map in bare Python a = [1,2,3,4,5] list(map(lambda x: x**2, a)) [1, 4, 9, 16, 25] list(filter(lambda x: x >= 4, a)) [4, 5] reduce(lambda x,y: x*y, a) 120 b = range(1000) %timeit list(map(lambda x: x**2, b)) 1000 loops, best of 3: 579 µs per loop %timeit…

  c1 = ['Russia', 'US', 'Germany'] c2 = ['007', '001', '049']   a = dict(Russia = ’007′, US = ’001′, Germany = ’049′ ) a {‘Germany’: ’049′, ‘Russia’: ’007′, ‘US’: ’001′}   b = {‘Russia’ : ’007′, ‘US’: ’001′, ‘Germany’: ’049′} b {‘Germany’: ’049′, ‘Russia’: ’007′, ‘US’: ’001′}   c = dict(zip(c1,c2)) c {‘Germany’: ’049′,…

  List comprehensions are an easy way to make a list out of another list. For example:   list1 = [1, 2, 3, 4, 5, 6] print(list1) list2 = [x**2 for x in list1] print(list2) [1, 2, 3, 4, 5, 6][1, 4, 9, 16, 25, 36]   A similar construct exists for dictionaries and is…

This is a fourth post in a series of exercises to predict popularity of a blog post on NYTimes website. Previous posts are located here: Naïve Random Forest Classifier. This post is about fitting plain vanilla Random Forest on readily available features, like sections where post was published, date, time, and bag of words extracted…

This is the 4-th post in the series of predicting popularity of a blog post on NYTimes. The first three are: Naïve Random Forest Classifier Fitting models on low signal-to-noise data Feature selection 1: Univariate In this post I am going to compare in-model feature section in plain vanilla LogisticRegression vs. that plus fitting Random…

This is the third post in an exercise to predict post popularity on a NYTimes website. Two previous posts described fitting a simple Random Forest model and theory behind the need for feature selection (or feature reduction): Naïve Random Forest Classifier Fitting models on low signal-to-noise data Though theoretically the need for feature selection is…

In the previous post titled Naïve Random Forest Classifier I considered performance of a Random Forest trained on ~5’000 cases with ~3’500 features. Presumably, only fraction of the features is true signal and the rest is noise. Fitting to noise is a major sin in Machine Learning; thus, taking preventive measures against doing so is…

This exercise is about predicting ‘popularity’ of post on NYTimes website. A ‘popular’ defined as a post with 20 or more comments. The data consists of 8402 entries altogether for training and testings sets. The fields are: NewsDesk = the New York Times desk that produced the story (Business, Culture, Foreign, etc.) SectionName = the…

© 2014 In R we trust.
Top
Follow us: