!pwd
/home/sergey/grep_
!cat file.txt
movieId,title,genres 1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy 2,Jumanji (1995),Adventure|Children|Fantasy 3,Grumpier Old Men (1995),Comedy|Romance 4,Waiting to Exhale (1995),Comedy|Drama|Romance 5,Father of the Bride Part II (1995),Comedy 6,Heat (1995),Action|Crime|Thriller 7,Sabrina (1995),Comedy|Romance 8,Tom and Huck (1995),Adventure|Children 9,Sudden Death (1995),Action
Count number of occurencies of «|»
1. With AWK
!awk -F "|" '{print(NR, NF-1)}' file.txt
1 0 2 4 3 2 4 1 5 2 6 0 7 2 8 1 9 1 10 0
!awk -F "|" '{printf("%4d %4d\n", NR, NF-1)}' file.txt
1 0 2 4 3 2 4 1 5 2 6 0 7 2 8 1 9 1 10 0
!awk '{gsub("[^|]",""); print NR,length($0)}' file.txt
1 0 2 4 3 2 4 1 5 2 6 0 7 2 8 1 9 1 10 0
2. With grep
!grep -n -o "|" file.txt | uniq -c | cut -d: -f 1
4 2 2 3 1 4 2 5 2 7 1 8 1 9
With Python
import re
regexp = re.compile("\|")
len(re.findall(regexp, "ab|ca"))
1
lst = []
with open('file.txt') as f:
for i,line in enumerate(f):
lst.append((i,len(re.findall(regexp, line))))
lst
[(0, 0), (1, 4), (2, 2), (3, 1), (4, 2), (5, 0), (6, 2), (7, 1), (8, 1), (9, 0)]
With PySpark
sc.version
'2.0.0'
path = "file:///home/sergey/grep_/file.txt"
rdd = sc.textFile(path)
rdd.zipWithIndex().map(lambda l: (l[1], len(re.findall(regexp,l[0])))).collect()
[(0, 0), (1, 4), (2, 2), (3, 1), (4, 2), (5, 0), (6, 2), (7, 1), (8, 1), (9, 0)]
Write a comment: