Preface#
A few days ago, I had some free time and spent four to five days collecting data on all 20 million users from Bilibili (http://bilibili.com).
The code is hosted on Github: https://github.com/airingursb/bilibili-user, and everyone can download and crawl it themselves.
Introduction to Bilibili#
Bilibili, also known as "bilibili bullet screen video website," is currently the largest youth-oriented cultural and entertainment community in China. The website was created on June 26, 2009.
I myself registered as a user on February 14, 2013. I vaguely remember that before the summer of 2013, Bilibili had restrictions on registration and only opened registration during special holidays. Later, captcha registration and answering questions became the official membership process.
Next, let's take a look at the user data on Bilibili (only preliminary statistics have been done).
User Situation#
Bilibili is a place with a strong ACG (anime, comic, and game) culture and, together with AcFun, has supported the anime industry in China.
So, about the users...
I won't say much, let's just take a look at some random screenshots I took of user signatures.
Preliminary Analysis of User Data#
Basic Overview#
- Total data: 20,119,918
- The order of collecting users is based on their registration time: from June 24, 2009, 14:06:54 to February 18, 2016, 21:04:52
- Estimated missing data: less than 2%
- Collected fields: user ID, nickname, gender, avatar, level, experience points, number of followers, birthday, address, registration time, signature, level, and experience points, etc.
Gender#
- Valid data: 14,643,019
- Confidential: 11,621,898
- Male: 1,674,196
- Female: 1,346,925
The gender ratio is a bit unexpected, almost 1:1. In fact, when I initially collected data before the summer of 2013, the gender ratio was around 3:1.
It can be seen that the group with a clearly defined gender is relatively small, accounting for only about 15% of the total data.
More analysis will be done in the future.
Age#
- Range: 1970-2010 (excluding 1980)
- Total data: 3,800,767
I won't include specific data, let's just take a look at the statistics.
The main user distribution is among users born between 1993 and 2000 (approximately 16-23 years old), with 1997 (19 years old) users being the majority.
It turns out that there aren't many elementary school students on Bilibili, but rather more high school and college students.
Users born in the 1990s make up the majority, but the age range of users is continuously shifting to older ages. After all, it is a website for young people.
Region#
- Analysis range: 34 provinces and regions in China
- Valid data: 863,541
The main user distribution is in Guangdong, Jiangsu, Beijing, Shanghai, Zhejiang, and other economically developed coastal regions.
Registration Time#
- Time range: June 24, 2009, 14:06:54 to February 18, 2016, 21:04:52
- Total data: 20,119,823
Since only two months have passed in 2016, the data is a bit less, but it can be predicted that the growth will far exceed that of 2015. Since the website was launched in 2009, the number of users has almost exponentially increased every year.
Activity Statistics#
- Level range: 0 - 6
- Total data: 20,119,918
- Cut-off time: February 18, 2016
Since Bilibili has an experience level system, user activity can be judged based on their level.
Level 0 means users who have only registered but have not logged in. Levels 1 and 2 represent inactive users. Levels 3 and above represent active users. Among them, levels 5 and 6 represent users with a large number of submissions and popular videos, making them the backbone of Bilibili (approximately 5,000 people).
Retention rate and other data will be analyzed in the future.
Follower Statistics#
- Valid data: 2,011,918
- Range: 0 - 988,323
- Cut-off time: February 18, 2016, 21:04:52
Ah, I'm also someone with two followers!
Below are the top 20 users on Bilibili. Many of them are very familiar.
Above is the preliminary statistics of the 20 million users on Bilibili. There will be more in-depth analysis in the future.