Thursday, July 9, 2020

Design Facebook Chat Function

Design Facebook Chat Function One of the most interesting parts of preparing system design interview is that you can get to know a lot of details about how existing systems are built. To make the weekly post more helpful, Id like to cover a wide range of topics. Weve been talking about stuff like recommendation, ranking a lot in the past few weeks, this time I want to cover something different. Question It starts with a very simple question how to design Facebook chat function? With great news like Facebook buys Whatsapp for $19B and Facebook messenger gets really popular recently, chat function is definitely a hot topic. So in this post, Im quite happy to talk about messages. Few things to mention here. First and foremost, as I mentioned in previous posts, system design interviews can be extremely diversified. Its mostly up to the interviewer to decide which direction to discuss. As a result, different interviewers can have completely different discussions even with the same question and you should never expect this article to be something like a standard answer. Also, Ive never worked on Facebook messenger nor Whatsapp. All the discussion here is based on Gainlo teams analysis. Basic infrastructure As said earlier, its better to have a high-level solution and talk about the overall infrastructure. If you have no prior experience with messaging app, you might find it not easy to come up with a basic solution. But thats totally fine. Lets have a very naive solution and optimize it later. Basically, one of the most common ways to build a messaging app is to have a chat server that acts as the core of the whole system. When a message comes, it wont be sent to the receiver directly. Instead, it goes to the chat server and is stored there first. And then, based on the receivers status, the server may send the message immediately to him or send a push notification. A more detailed flow works like this: User A wants to send message “Hello Gainlo” to user B. A first send the message to the chat server. The chat server receives the message and sends an acknowledgement back to A, meaning the message is received. Based on the product, the front end may display a single check mark in As UI. Case 1: if B is online and connected to the chat server, thats great. The chat server just sends the message to B. Case 2: If B is not online, the chat server sends a push notification to B. B receives the message and sends back an acknowledgement to the chat server. The chat server notifies A that B received the message and updates with a double check mark in As UI. Real-time The whole system can be costly and inefficient once its scaled to certain level. So any way we can optimize the system in order to support a huge amount of concurrent requests? There are many approaches. One obvious cost here is that when delivering messages to the receiver, the chat server might need to spawn an OS process/thread, initialize HTTP (maybe other protocol) request and close connection at the end. In fact, this happens to every message. Even if we do the other way around that the receiver keeps requesting the server to check if theres any new message, its still costly. One solution is to use HTTP persistent connection. In a nutshell, receivers can make an HTTP GET request over a persistent connection that doesnt return until the chat server provides any data back. Each request will be re-established when its timed out or interrupt. This approach provides a lot of advantages in terms of response time, throughput and cost. If you want to know more about HTTP persistent connection, you can check things like BOSH. Online notification Another cool feature of Facebook chat is showing online friends. Although the feature seems to be simple at the first glance, it improves user experience tremendously and its definitely worth to discuss. If you are asked to design this feature, how would you do it? Obviously, the most straightforward approach is that once a user is online, he sends a notification to all his friends. But how would you evaluate the cost of this? When its at the peak time, we roughly need O(average number of friends * peak users) of requests, which can be a lot when there are millions of users. And this cost can be even more than the message cost itself. One idea to improve this is to reduce unnecessary requests. For instance, we can issue notification only when this user reloads a page or sends a message. In other words, we can limit the scope to only “very active users”. Or we wont send notification until a user has been online for 5min. This solves the cases where a user shows online and immediately goes offline. Summary There are many other topics I havent covered in the post, for example if you dig deeper about the network stuff, we can talk about what network protocol can be used in the connection. Also, how to deal with system error and replicate the data can be interesting as well since chat app is quite different. Feel free to leave a comment if you want to have further discussion with me.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.