PLANA PERSONAL DE JAUME MASSANÉS I PAPELL

04 de maig 2008

El català, la llengua més emprada en els blocs després de l'anglès

We use statistical techniques to identify blog language. That means that our algorithm decides what language a blog is in by looking at the text content, and not at any language attributes in the markup. Weblogs with fewer than 500 bytes of text content are not included in this list. How reliable is our algorithm?
We're currently doing statistical sampling to find out, and will post results in this table when they are ready.

English 1958443
Catalan 123320
French 83950
Spanish 80509
Portuguese 71561
German 35870
Italian 26659
Chinese-big5 25123
Farsi 19730
Chinese-gb2312 19324
Japanese 18576
Dutch 13133
Danish 9870
Indonesian 8831
Malay 6658
Japanese-euc_jp 5413 S
wedish 5267
Czech 5089
Icelandic 3776
Tagalog 3608
Finnish 3326
Turkish 2817
Esperanto 2803
Slovak-ascii 2592

http://www.rafamartin.info/