import numpy as np import matplotlib.pyplot as plt %matplotlib inline   from math import log def logloss(p, y): epsilon = 10e-12 if p == 0: p += epsilon if p == 1: p -= epsilon if y == 1: return -log(p) if y == 0: return -log(1-p) def evaluate_logloss(p,labels): return sum(list(map(lambda x: logloss(p,x), labels)))/len(labels)…

  Простейший способ распечатать несколько переменных:   a = 12 b = 123.45 print(a, b, a * b) 12 123.45 1481.4   Следует отметить, что IPython представляет еще более простую конструкцию печати посредством tuple:   a, b, a * b (12, 123.45, 1481.4)   Если мы пользуемся первым способом, то мы можем указать желаемый разделитель,…

import pandas as pd from functools import reduce Map in bare Python a = [1,2,3,4,5] list(map(lambda x: x**2, a)) [1, 4, 9, 16, 25] list(filter(lambda x: x >= 4, a)) [4, 5] reduce(lambda x,y: x*y, a) 120 b = range(1000) %timeit list(map(lambda x: x**2, b)) 1000 loops, best of 3: 579 µs per loop %timeit…

  c1 = ['Russia', 'US', 'Germany'] c2 = ['007', '001', '049']   a = dict(Russia = ’007′, US = ’001′, Germany = ’049′ ) a {‘Germany’: ’049′, ‘Russia’: ’007′, ‘US’: ’001′}   b = {‘Russia’ : ’007′, ‘US’: ’001′, ‘Germany’: ’049′} b {‘Germany’: ’049′, ‘Russia’: ’007′, ‘US’: ’001′}   c = dict(zip(c1,c2)) c {‘Germany’: ’049′,…

Difference between bytes and strings When working with data inputs in Python — processing text, doing statistical analysis — we are working with strings. In [7]: type(‘café’) Out[7]: str When reading files from disc into Python we decode binary data into strings and when saving text to disc we encode stings to binary. str.encode() method is…

  List comprehensions are an easy way to make a list out of another list. For example:   list1 = [1, 2, 3, 4, 5, 6] print(list1) list2 = [x**2 for x in list1] print(list2) [1, 2, 3, 4, 5, 6][1, 4, 9, 16, 25, 36]   A similar construct exists for dictionaries and is…

This is a fourth post in a series of exercises to predict popularity of a blog post on NYTimes website. Previous posts are located here: Naïve Random Forest Classifier. This post is about fitting plain vanilla Random Forest on readily available features, like sections where post was published, date, time, and bag of words extracted…

This is the 4-th post in the series of predicting popularity of a blog post on NYTimes. The first three are: Naïve Random Forest Classifier Fitting models on low signal-to-noise data Feature selection 1: Univariate In this post I am going to compare in-model feature section in plain vanilla LogisticRegression vs. that plus fitting Random…

This is the third post in an exercise to predict post popularity on a NYTimes website. Two previous posts described fitting a simple Random Forest model and theory behind the need for feature selection (or feature reduction): Naïve Random Forest Classifier Fitting models on low signal-to-noise data Though theoretically the need for feature selection is…

In the previous post titled Naïve Random Forest Classifier I considered performance of a Random Forest trained on ~5’000 cases with ~3’500 features. Presumably, only fraction of the features is true signal and the rest is noise. Fitting to noise is a major sin in Machine Learning; thus, taking preventive measures against doing so is…

© 2014 In R we trust.
Top
Follow us: