Airing

Airing

哲学系学生 / 小学教师 / 程序员,个人网站: ursb.me
github
email
zhihu
medium
tg_channel
twitter_id

Analysis of 20 million users on Bilibili

Preface#

A few days ago, I had some free time and spent four to five days collecting data on all 20 million users from Bilibili (http://bilibili.com).

The code is hosted on Github: https://github.com/airingursb/bilibili-user, and everyone can download and crawl it themselves.

Introduction to Bilibili#

Bilibili, also known as "bilibili bullet screen video website," is currently the largest youth-oriented cultural and entertainment community in China. The website was created on June 26, 2009.

I myself registered as a user on February 14, 2013. I vaguely remember that before the summer of 2013, Bilibili had restrictions on registration and only opened registration during special holidays. Later, captcha registration and answering questions became the official membership process.

Next, let's take a look at the user data on Bilibili (only preliminary statistics have been done).

User Situation#

Bilibili is a place with a strong ACG (anime, comic, and game) culture and, together with AcFun, has supported the anime industry in China.

So, about the users...

I won't say much, let's just take a look at some random screenshots I took of user signatures.

User Signature
User Signature
User Signature
User Signature
User Signature
User Signature
User Signature

Preliminary Analysis of User Data#

Basic Overview#

  • Total data: 20,119,918
  • The order of collecting users is based on their registration time: from June 24, 2009, 14:06:54 to February 18, 2016, 21:04:52
  • Estimated missing data: less than 2%
  • Collected fields: user ID, nickname, gender, avatar, level, experience points, number of followers, birthday, address, registration time, signature, level, and experience points, etc.

Gender#

  • Valid data: 14,643,019
  • Confidential: 11,621,898
  • Male: 1,674,196
  • Female: 1,346,925

Gender Statistics

The gender ratio is a bit unexpected, almost 1:1. In fact, when I initially collected data before the summer of 2013, the gender ratio was around 3:1.

Gender Statistics

Gender Statistics

It can be seen that the group with a clearly defined gender is relatively small, accounting for only about 15% of the total data.

More analysis will be done in the future.

Age#

  • Range: 1970-2010 (excluding 1980)
  • Total data: 3,800,767

I won't include specific data, let's just take a look at the statistics.

Age Statistics

The main user distribution is among users born between 1993 and 2000 (approximately 16-23 years old), with 1997 (19 years old) users being the majority.

It turns out that there aren't many elementary school students on Bilibili, but rather more high school and college students.

Age Statistics

Age Statistics

Users born in the 1990s make up the majority, but the age range of users is continuously shifting to older ages. After all, it is a website for young people.

Region#

  • Analysis range: 34 provinces and regions in China
  • Valid data: 863,541

Region Statistics

The main user distribution is in Guangdong, Jiangsu, Beijing, Shanghai, Zhejiang, and other economically developed coastal regions.

Region Statistics

Region Statistics

Registration Time#

  • Time range: June 24, 2009, 14:06:54 to February 18, 2016, 21:04:52
  • Total data: 20,119,823

Registration Time Statistics

Since only two months have passed in 2016, the data is a bit less, but it can be predicted that the growth will far exceed that of 2015. Since the website was launched in 2009, the number of users has almost exponentially increased every year.

Registration Time Statistics

Registration Time Statistics

Activity Statistics#

  • Level range: 0 - 6
  • Total data: 20,119,918
  • Cut-off time: February 18, 2016

Since Bilibili has an experience level system, user activity can be judged based on their level.

Level 0 means users who have only registered but have not logged in. Levels 1 and 2 represent inactive users. Levels 3 and above represent active users. Among them, levels 5 and 6 represent users with a large number of submissions and popular videos, making them the backbone of Bilibili (approximately 5,000 people).

Level Statistics

Level Statistics

Retention rate and other data will be analyzed in the future.

Follower Statistics#

  • Valid data: 2,011,918
  • Range: 0 - 988,323
  • Cut-off time: February 18, 2016, 21:04:52

Follower Statistics

Ah, I'm also someone with two followers!

Follower Statistics

Below are the top 20 users on Bilibili. Many of them are very familiar.

Follower Statistics


Above is the preliminary statistics of the 20 million users on Bilibili. There will be more in-depth analysis in the future.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.