Back to Posts

The Fast Approach to Security: Securing User Data over gRPC/Protobuf

By Tyler Julian, Senior Staff Software Engineer, Infrastructure

Introduction

A common class of security bugs, called Insecure Direct Object References (IDOR), allows improper access to sensitive data. This extremely common vulnerability often results in payouts of thousands of dollars on bug bounty programs.

The simplest example is through an API, such as:

https://example.com/orders/7387172

IDOR occurs when the server forgets to verify that order #7387172 actually belongs to the currently authenticated user, allowing any user to view or modify this order. 

This results in broken access control, the number one security risk for web applications according to the 2021 OWASP Top 10. Broken access control can lead to mass data disclosure by even unsophisticated attackers who simply “change numbers in the URL.”

A seemingly simple solution

To address this, services might start to develop their own “checks” that verify ownership of an order before handling a request. Here’s an example:

authedUserID := auth.GetCurrentUser(ctx)
order, _ := db.FetchOrder(ctx, req.GetOrderId())

if order.BuyerUserID != authedUserID {
    return nil, status.Error(codes.PermissionDenied, "denied")
}

But there are a few problems with this approach:

  1. Authorization checks tend to get more complicated over time, and these checks become intertwined with business logic, which makes those checks more difficult to test and debug.
  2. Making systemic changes to authorization checks is difficult because those checks are scattered across many files and components.
  3. Because authorization checks are decentralized into various applications, it becomes difficult to answer the question, “Who has access to this data?”, which matters greatly for privacy and compliance.
  4. New developers on engineering teams may not have context about checks and may not know how to use them correctly, or forget to include them altogether.

The Fast approach: scaling up with an in-house framework

Our engineers on the identity and access management team developed ACE, our Access Control Ecosystem, as a central framework to enable service owners to control access to their data by configuration, without losing time, money, or frustration to rewriting commonplace authorization checks.

There are a lot of access control solutions out there: Keto (based on Google’s Zanzibar), Oso, Open Policy Agent, and Casbin, to name a few. But we decided to build a Fast framework in-house to solve our issue.

These factors influenced our decision to build in-house:

  1. Integration with our datastores. DynamoDB stores our online production data, which we need in order to evaluate policies (to check ownership of data, i.e., the buyer of an order). An off-the-shelf solution would require non-trivial “glue” to pull in our data for authorization, resulting in more labor and solution complexity.
  2. Integration with our infrastructure. On the backend, our data flows as protobuf messages over gRPC between Go services. By evaluating policy within application middleware itself, service owners are better equipped to debug issues by looking at distributed traces and custom logging.
  3. Integration with our developers. Policy languages are powerful, often complex, and come with a learning curve. They are also (intentionally) constrained in how they can evaluate policy. By first writing policy as Go functions, we empower developers to write well-tested authorization logic in a language that is familiar. It also lets us work out potential use cases in a more general-purpose language before we move to a formal policy language.

Building begins

Since we use gRPC for all of our APIs, together with Envoy’s gRPC-JSON transcoder, we decided to implement this framework using Protobuf Custom Options

For example:

rpc GetOrder(GetOrderRequest) returns (GetOrderResponse) {
    option (google.api.http) = {
        get: "/orders/{order_id}",
    };

    option (co.fast.ace.method_option) = {
        authorizer: AUTHORIZER_ORDER
    };
};

This configuration tells ACE: “This API accesses an order, so verify that the currently authenticated user actually owns this order.”

And, since there aren’t many examples of using custom gRPC options in Go, we’ll show a bit more about how we use them at Fast:

// Convert the slash-delimited full method name into a dot-delimited one.
//
// Inside of gRPC interceptor, we receive a fullMethod that is slash-delimited, however
// when we go to do the lookup in the proto registry, we need a dot-delimited name.
// Example: "/co.fast.api.v1.OrderService/GetOrder"
//          -> "co.fast.api.v1.OrderService.GetOrder".

fullName, _ := fullMethodToFullName(fullMethod)

// Look up the method name in the proto registry.
//
// Subtly, this requires that your service imports the proto files for the service
// at some point, otherwise it won't be added to the proto registry.

method, _ := protoregistry.GlobalFiles.FindDescriptorByName(protoreflect.FullName(fullName))

// Extract the custom ACE options on the rpc method.
//
// In production code, check `proto.HasExtension(opts, acepb.E_MethodOption)` first.

opts := method.Options()
authExtIfc := proto.GetExtension(opts, acepb.E_MethodOption)
methodAce, _ := authExtIfc.(*acepb.MethodOption)

// ...
// Use methodAce.GetAuthorizer() to route to the correct authorization checks,
// such as "acepb.Authorizer_AUTHORIZER_ORDER" in the case of order authorization.

Making the “who gets access” decision

From here, we have everything we need to implement our order-checking logic from earlier as a gRPC UnaryServerInterceptor. This involves:

  1. Inspecting the request to find an order_id. Since we use protobufs for all of our RPCs, the message structures are standardized and easy to inspect from middleware.
  2. Fetching the order from storage. This can be durable storage or a volatile cache, since order ownership isn’t mutable.
  3. Verifying the order buyer has a matching ID to the currently authenticated user.

Postel’s Law (sometimes called the Robustness Principle) encourages us to be “liberal in what you accept” when writing software. But this often contradicts security practices, which ask us to be discerning. To balance the two, we take a deny by default strategy for access.

But we don’t simply deny without context. We consider it important to give our internal users and developers clear feedback about why their access is denied – and how to fix it.

Observability

Early on, we knew our Fast developers wanted insight into access decisions and their results, so we made sure to provide that in our structured logs:

ace {
    allow:          true,
    caller_user_id: "faa7584b-86f5-48f0-8647-ec465cacf87f",
    rpc_method:     "/co.fast.api.v1.OrderService/GetOrder",
    order_id:       "a8cdac5d-da3f-45c7-bc2b-265e188eab6a",
}

This worked well at first, but over time, we saw that logging a simple boolean true/false wasn’t sufficiently detailed. We added a “result” field, which details why access was allowed or denied. This enormously helped our developers debug their own access control issues:

ace {
    allow:          true,
    caller_user_id: "faa7584b-86f5-48f0-8647-ec465cacf87f",
    result:         "caller_is_requesting_their_own_order",
    
rpc_method:     "/co.fast.api.order.v1.OrderService/GetOrder",
    order_id:       "a8cdac5d-da3f-45c7-bc2b-265e188eab6a",
}

Over time, we turned these ephemeral application monitoring logs into structured database entries that are stored for longer periods of time, enabling audit functionality in the case of an investigation of unauthorized data access.

Ace in the Hole: Role-Based Access Control (RBAC)

A major benefit of creating this framework is that our entire access control stack is now centrally managed, and we can make improvements over time without requiring time-consuming changes by developers.

For example, when the internal business need for role-based access control arose – which would allow various degrees of store employees and Fast customer support representatives to access specific API types (payments, users, etc.) – we were able to implement those access levels more efficiently.

Another benefit: we reduced the support calls from Fast sellers, since they could now access APIs to make changes. The positive domino effect continued with a reduction in how often Fast Support had to contact our engineering department, since Support also had access to even more APIs to make common changes.

We accomplished this by first implementing a role system: user roles associated with objects like stores, users, or orders. Next, in our middleware, we added an additional check that verifies if the current user has a role associated with the object being accessed. If not, access is denied.

Results so far

Earlier, we mentioned how IDOR vulnerabilities turned into massive payouts for bug bounty programs. Prior to implementing our ACE, we received at least four reports of IDOR to our HackerOne bug bounty program over a three-month period. Now, after a full year using our ACE, we have received only two reports, and in those situations:

  1. The first was due to a service that had misconfigured their ACE integration. Once the misconfiguration was patched, we added safeguards to prevent that class of misconfigurations from being created again.
  2. The second was for an API that had complex authorization requirements, so it had to bypass ACE completely and implement authentication and authorization itself. These kinds of APIs are rare, but do exist, so we’ve done our best to limit and monitor those APIs more closely.

Future plans

This post is merely an introduction to how we approach access control at Fast. Here are some related topics we plan to explore in future posts:

  • Performance requirements and improvements
  • Service-to-service authorization
  • Policy extensibility (dashboards and query language)
  • Real time changes to policies
  • Annotating data for more fine-grained access
  • Envoy/Istio integration

In the meantime, if you’re interested in building infrastructure software, we’re hiring! Check out fast.co/careers and apply!


Tyler Julian is a software engineer on Fast’s infrastructure team, focusing on reliability and security. Prior to joining Fast, Tyler spent time at Uber building security infrastructure. When he’s not writing software, Tyler spends his time gaming on Battle.net, writing music, and plunging into cold water.

Ready to go Fast?
Install Fast Checkout in minutes so your customers can check out in seconds.
Install Fast Checkout →