包子小道消息@04/05/2020
清明时节雨纷纷,国内降半旗为新冠肺炎逝去的同胞以及烈士英雄默哀,Respect!
- 美国这边已经33万多确诊,稳稳的世界№1。sign… 有一种行动缓慢沙雕的感觉。。。说好的白头海雕呢
- 撑下去这波的将来都是王者,Youtube要开始进军短视频对抗TikTok,叫做shorts?大裤衩?咱读书少也不敢问
- Airbnb 老大Brian 录了一个视频给hosts(https://www.airbnb.com/d/host-message),安抚一下大家受伤的心灵。据说降低了409A变成$26B的估值了(之前是31B)。包子君觉得撑过去,之后应该就是强势反弹,毕竟大家都憋了这么久,安全了之后肯定要使劲儿玩补回来,大家怎么觉得?
[ 大连话搞笑教学版]Baozi Training 系统设计: Youtube|抖音|Instagram 评论,like和view的设计
Blogger: https://blog.baozitraining.org/2020/03/system-design-interview-how-to-design-comments.html
Youtube: https://youtu.be/HSDfhFUNWxs
博客园: https://www.cnblogs.com/baozitraining/p/12178850.html
B站: https://www.bilibili.com/video/av82920867/
Methodology: READ MF!
[Originally from the Post: System design interview: how to design a chat system (e.g., Facebook Messenger, WeChat or WhatsApp)]
Remind ourselves with the “READ MF!” methodology.
This is a follow up on the previous post: System design interview: how to design a video platform (e.g, Youtube, Netflix)
Requirements
First, let’s quickly about requirements. Likes and views are relatively straightforward, users can torllerate a bit delay and inaccuracy. For the majority case, as long as user clicks the button, it pluses one, then it’s fine. Sometimes it’s not the case if videos have too many likes, e.g., if a Youtube video already has 10K likes, you plus 1, it still shows 10K, it’s just the UI tricks you that it’s toggled.
For comments reply, there are a few different styles. The major ones such as Youtube, Instagram and TikTok uses following style. It displays comments (directly reply to video) based on order of likes and timestamp (descending) and any reply to comments are only 1 on 1, meaning A@B How are you, then B@A I am fine thank you and you? There is no more indentation needed.


Reddit uses the “block building” reply-to style (中文俗称”盖楼”), where it shows which reply replies to which reply, and it needs to show the indentations about those replies.

Estimation
For viral videos, say normally it has around 10M views
For likes, assuming 20% people liked a video, 10M * 20% = 2M likes
For comments, 1% people would leave a comment (we are lazy) 10M * 10% = 100K comments
For the majority normal videos, it would probably has 1000 views and 100 likes top and maybe 10 comments, a relationship DB could solve it pretty well.
Key designs and terms
Comments design
If you start building your product, just bootstrap it with a relational DB
Introduce a comments table shard by video UUID, add a reply_to_uuid to know which comment the reply is targeted to and leave it null for root comment. Build an index on the reply_to_uuid
Select * from comments where reply_to_uuid is null order by comment likes desc, timestamp desc
If you need to see the replies to those comments, just
Select * from comments where reply_to_uuid is the_target_comment_uuid order by comment likes desc, timestamp desc
Even if your product becomes Youtube scale, the comments would be around 100K for viral videos, the above solution would still works fine. Simply add more capacity to better shard your comments using consistent hashing, cache the comments would do the trick.
If you need to build the Reddit tree structure, just sort it in memory. If the problem can fit into memory, it becomes much easier.
The extreme case is your comments section becoming a chat, then we can do something like an append only in memory DB or redis cache keep appending the values to the queue with async backup to DB.
Views and Likes count design
Similarly, when you bootstrap the project, keep a counter in DB or in memory cache solves your problem when traffic is low. If within one machine, you don’t even need locks just use compare and swap (CAS), atomic operations for counting, thread safe.
If your product starts to become popular, add more capacity using consistent hashing. Add in memory cache like Redis to count the values (memory access time 100us vs disk access time 10ms. 100Kx improvement). Could be further optimized using distributed counter, aggregating the results together when read.
If you product becomes YouTube scale, then use offline counting. Build a pipeline to promote the videos from cold to hot/viral once the view counts hit a certain threshold (say 1M). Use async messaging like Kafka to ingest from those logs and pump it to data warehouse, query it and update the values on a cron schedule. Of course on the UI side, you need to toggle the like button, plus 1 if needed (Sometimes you would see a 100K likes video, even if clicked the like, the count would not be increased)