例えば、今日のニュース記事の一段落で登場する文字をカウントしてみる。
The O+ Festival in San Francisco this weekend would seem a typical indie arts event, with performances by local musicians and displays of funky art. But in a twist that highlights a longstanding problem in the creative economy, the artists involved will be paid not in cash but rather in something they may need just as badly: health care.出典 : Will Play for Health Care (at Least at One Music Event) - NYTimes.com
>>> article = """The O+ Festival in San Francisco this weekend would seem a typical indie arts event, with performances by local musicians and displays of funky art. But in a twist that highlights a longstanding problem in the creative economy, the artists involved will be paid not in cash but rather in something they may need just as badly: health care.""" >>> from itertools import groupby >>> article_group = ((len(list(group)),character) ... for character,group in groupby(sorted(article.lower()))) >>> for count, character in article_group: ... print "%02d,%s"%(count, character) ... 58, 01,+ 02,, 02,. 01,: 26,a 06,b 10,c 10,d 27,e 05,f 05,g 15,h 25,i 01,j 02,k 14,l 07,m 22,n 13,o 05,p 11,r 20,s 27,t 06,u 05,v 05,w 08,y
という感じで、最も多いのは半角スペースで58個、ついでe,tが27個ですね。
groupbyは連続している同じ要素をグループ化するので、
sorted関数で文字列を予め並べ替えておく必要があります。
また、今回は大文字小文字を区別しなかったので、lower関数を使っています。
0 件のコメント:
コメントを投稿