!pwd
/home/sergey/grep_
!cat file.txt
movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
8,Tom and Huck (1995),Adventure|Children
9,Sudden Death (1995),Action

Count number of occurencies of «|»

1. With AWK

!awk -F "|" '{print(NR, NF-1)}' file.txt
1 0
2 4
3 2
4 1
5 2
6 0
7 2
8 1
9 1
10 0
!awk -F "|" '{printf("%4d %4d\n", NR, NF-1)}' file.txt
   1    0
   2    4
   3    2
   4    1
   5    2
   6    0
   7    2
   8    1
   9    1
  10    0
!awk '{gsub("[^|]",""); print NR,length($0)}' file.txt
1 0
2 4
3 2
4 1
5 2
6 0
7 2
8 1
9 1
10 0

2. With grep

!grep -n  -o "|" file.txt | uniq -c | cut -d: -f 1
      4 2
      2 3
      1 4
      2 5
      2 7
      1 8
      1 9

With Python

import re
regexp = re.compile("\|")
len(re.findall(regexp, "ab|ca"))
1
lst = []
with open('file.txt') as f:
    for i,line in enumerate(f):
        lst.append((i,len(re.findall(regexp, line))))

lst
[(0, 0),
 (1, 4),
 (2, 2),
 (3, 1),
 (4, 2),
 (5, 0),
 (6, 2),
 (7, 1),
 (8, 1),
 (9, 0)]

With PySpark

sc.version
'2.0.0'
path = "file:///home/sergey/grep_/file.txt"
rdd = sc.textFile(path)
rdd.zipWithIndex().map(lambda l: (l[1], len(re.findall(regexp,l[0])))).collect()
[(0, 0),
 (1, 4),
 (2, 2),
 (3, 1),
 (4, 2),
 (5, 0),
 (6, 2),
 (7, 1),
 (8, 1),
 (9, 0)]
Write a comment:

*

Your email address will not be published.

© 2014 In R we trust.
Top
Follow us: