Tom Larkworthy is the developer of Firesafe, a language wrapper for Security and Firebase Rules. Firesafe eases development of complex protocols.
He cares about logical perfection, and finds holes in Firebases using formal methods.
Data integrity is hard to scale in databases. Issues surface quickly in massively multiplayer games, where the players are incentivized to cheat. Firesafe is a language for expressing and checking protocols to solve data integrity problems. In this post, I show how sending an item between players can be implemented safely and deadlock free with a protocol developed using Firesafe.
The cost of non-ACID compliance
Data integrity is very hard to manage in scalable, realtime applications. Most massively multiplayer games have been hit with an item duplication bug at some point. This type of bug allows users to clone rare items, consequently trashing the virtual economy with deflation, ruining the reward structure and costing the hosting company millions in lost revenue. If not managed, the virtual world can be damaged for years. This has affected popular games in the past, including World of Warcraft, Runescape, Farmville, and many others.
The dual of the duplication bug is the bug where items disappear. The loss of an item flushes a player’s efforts down the drain, makes them very angry, and usually causes them to call support. The major cost in massively multiplayer games is not the operational cost of running servers, but the cost of answering thousands of support requests. Data integrity problems are a major cause of support requests.
These problems affect big companies with big budgets, and these bugs are costing them millions. So why don’t companies just develop working code?
The reason why these bugs are widespread is because of a common underlying technical issues. Essentially, it is difficult to ensure that transactions always work correctly in a distributed system (called ACID compliance). ACID compliant transactions are even harder to implement when servicing a real time system.
Firesafe
Firesafe is a language for expressing state machine dynamics that is compiled into Security and Firebase Rules. State machines are a natural programming abstraction used extensively in the telecommunications industry to solve similar hard problems. State machines are attractive for formal verification because their finite sized state space makes enumerating their total behavior tractable.
I have developed a system that verifies the correctness of protocols implemented with Firesafe. This post is a demonstration of one possible use of this technology: the development of a protocol for ACID-like transfer of items between users. Item transfers are a canonical hard dynamic to get right in multiplayer games.
Item Transfer
A lot of bugs in multiplayer games are centered around the trading dynamic, when one player sends an item to another player. What should happen is one player loses an item, and another player gains the item. Often this is not the case.
Common errors are both players end up with the item, or both lose the item. It sounds like an easy feature to develop, but the player accounts might be on different computers, and a player could (deliberately) disconnect from the game at any point.
Distributed exchange of state is a similar problem banks have when performing an electronic transfer between financial institutions. In developing a solution for Firebase, I have drawn from the bank’s solution of a 2 phase commit protocol (2PC).
In a 2PC, a central authority coordinates the a transaction between party A and B. In this approach, it is critical that the coordinator waits until both A and B have indicated they are willing to commit before sending a message that the transfer is going ahead (or not). Before the commit message, either party can back out the transaction or go offline, in which case the coordinator must announce that the transfer is aborted through a message.
A and B have to wait for the coordinators decision, which is the main issue with 2PC.
It is safe, but it blocks if the coordinator fails. There are also problems if a coordinators commit message goes missing. As we will see, due to some of the guarantees provided by Firebase, many of the main issues of 2PC disappear.
Firebase
Firebase is like no other database. It is completely scalable and realtime, but has no stored procedures and all data must be in a tree structure. It only supports transactions on sub-trees, and it enforces data integrity through a custom security language, which is less expressive than traditional SQL conditions.
It is also a work of genius.
While Firebase may seem like a long list of restrictions, it is clear that every restriction is very well thought out and provides an opportunity for blazing realtime performance and scalability. So Firebase is a hot rod, and you don’t even have to host anything! The Firebase team deals with expensive operational issues, which democratizes access to top DB performance.
That said, the drawback is that ensuring data goes in and out the database properly is more difficult, as there is no opportunity to run secured arbitrary application level code before remote clients access the database layer. You can only manage data integrity with the security functions Firebase provides. In the case of transferring data from one user account to another, one problem is that Firebase does not support transactions across sibling branches.
An often touted route around Firebase’s limitations is to add an additional privileged server to provide additional functionality. However, that is not ideal, as now you have to manage those servers, and they also add additional latency.
The true way for item transfer, in keeping with the Firebase philosophy, is to have clients carry out the transactions. By leveraging atomic in-branch updates, plus out-of-branch referencing in Security and Firebase Rules, it is possible to sequence concurrency safe protocols. However, this requires carefully thought out Security and Firebase Rules to ensure the system isn’t open to exploitation.
Firesafe is a language that lets us encode state machines with managed variables inside a Firebase. It provides hooks to limit how variables are modified based on the machines current state. This is a natural way of expressing concurrent protocols, and allows developers to concentrate on the “clever bits” of the the protocol without the error prone drudgery of managing state transitions.
A two-phase commit protocol for Firebase
Inspired by the banking sector, I’ve designed a protocol that lets one player send an item to another player. This post gives a basic overview of a two-phase commit example; for a full explanation with code samples, check out the github wiki.
Note that each player account resides on a different branch of user. So we can’t use Firebase’s subtree transactions to make it atomic.
This kind of transaction is business critical. Either it works, or the transaction is rolled back to the original state. No dupes. No losses. Furthermore, we want this system to be deadlock free. If one player goes offline, it should not hold the other player in limbo.
I have taken the classic 2PC protocol and adapted it for Firebase. Firebase is only able to verify data with boolean clauses, it can’t actually move data around itself, and it can’t emit messages like the classic central authority.
Thus, the connected clients have to initiate the changes of state, with Firebase simply double checking the validity. To get safe transfers, we have to logically force the clients down a predefined set of state transitions that can only lead to transfer of the item or an aborted and rolled back transfer.
An sketch of a successful integration of the two players is shown below.
The ordered stages of the protocol is:
- IDLE: both players start in the idle state
- TX: A states they want to send their item to B
- RX: B states they want to receive the item from A
- ACK_RX: A acknowledges B
- ACK_TX: B acknowledges A
- BOOM! do transaction, with rollback opportunity anywhere before B acknowledges A.
We need a number of variables for each user:
- item: the item a player owns, or null if they have nothing
- state: the state the player is in (IDLE, RX, TX, ACK_TX, ACK_RX)
- tx_itm: the item being send (we need to track this for rollbacks)
- rx_itm: the item being received
- tx_ptr: a pointer to the player that is being sent to
- rx_ptr: a pointer to the player that is being received from
We then need to add Security and Firebase Rules to ensure a client can only alter those variables in a way that leads to our item transfer dynamics. To avoid deadlocks, a player should be able to push an authorised transfer through, even if the other party goes offline. The permissions have to be carefully set to avoid one player authorising the transfer on the other player’s behalf. Either player should be able to abort before confirmation.
Summary
The whole protocol, including deadlock free rollbacks even if one player goes offline, is shown below:
The Firesafe source HSM representation is here, the compiled rules file is here.
Firesafe’s HSM representation is 50% less lines of code, and a large portion of that is variable and state declarations. The Firesafe language lets you concentrate on the guards and effects whilst alleviating the burden of enforcing state switching. There are features of the HSM language we have not mentioned in this post. HSM supports Hierarchical State Machines which is a feature that reduces space explosion by inheritance.
When I first considered developing sibling transactions on Firebase, I did not know deadlock-free ACID compliant transfers would be possible. Developing this protocol was my own personal test of Firebase’s metal. With Firebase passing this test, I am fairly sure you can do anything on Firebase without involving third party servers.
Getting data integrity right is tough. I was only sure I had it right after putting it through formal verification (which did indeed find lots of holes in my original logic). Applying formal methods to Firebase semantics is not straight forward yet. You need to ask yourself this question, if a malicious user exploits your database, can it financially ruin you, or nuke your product? If the answer is yes, then test driven development is not a robust enough methodology, you need formal verification.
If you are interested in having Tom verify your app’s security design, you can email him at tom.larkworthy@gmail.com to inquire about this kind of service. Stay up to date on his verification technology here.