Focus on the Business with Serverless API Using AWS and GraphQL

In This Article You Will Learn

  1. A Bit About GraphQL
  2. A Bit About Serverless
  3. Why Serverless Goes Well With Graphql?
  4. The Simple Architecture We Created
  5. The Entry Point GraphQL Lambda
  6. Available Queries File
  7. The Account Query File
  8. Querying With Result
  9. What We Achieved

Introduction

I used to work as a web application developer before I started at Infinite Lambda and so far most of the applications I have developed were hosted on a VPS or a hosting server. Moving to a data engineering position I had to switch to AWS and its services and for the past ten months, I have been working with a great team on developing a product for a client of ours using a serverless architecture.

The main problem we had to solve was building an API that provides scalable access to part of the existing data and gives admin users the ability to operate on the already existing lambdas using a React JS user interface.

Initially, we were not completely sure of the way we should structure the application because of a lack of clear vision for the product. We started by creating a few endpoints linked to lambdas using API Gateway which provides a way to link both. Furthermore, we were expected to have only a few endpoints, however, the project grew quickly, and all of a sudden we had to accommodate having a few dozen endpoints.
Given we have lots of data, a serverless application, and an increasing amount of functionality we had to decide a few critical aspects:

  1. Can we handle all the endpoint requests without having to implement separate lambdas and the accompanying Cloud Formation infrastructure configuration for each one of them?
  2. Can we build an endpoint that can easily filter, order, and paginate query results?
  3. Can we create all this functionality and pay only for execution time and not idle server time?

In order to satisfy all those needs, we decided to use GraphQL with a single lambda endpoint. This way we were able to streamline the development and complexity of the infrastructure and focus our efforts on the business logic instead of spending a lot of time on infrastructure implementation.

A Bit About GraphQL

GraphQL is an open-source query language developed by Facebook with the idea of allowing frontend developers to ask for the data they need from the backend. The framework makes it easier to introduce changes without having to add a lot of effort rewriting the interface of the endpoint. In addition, there is only a single endpoint, which means that the routing happens internally in the lambda instead of API Gateway when we talk about deploying it to AWS.

In short, it helps you reduce the network costs and obtain greater query efficiency, the frontend instructs the backend how the response should look like and allows the UI to introspect the way the DB schema is organized on the backend.

Bear in mind that we have to take the technology with a grain of salt since it is great for running queries, however, whenever you need to aggregate results and perform more complicated joins it becomes slower and it takes more time to research and write the complicated code unless you are already experienced in doing so.

There are numerous medium and large companies using GraphQL. Here are just a few to start with:

Why Serverless Goes Well With GraphQL?

Serverless technology is becoming more and more popular with each passing day since it provides developers with not only flexibility but also scalability.

Serverless means that:

  1. Customers pay per execution
  2. There is no server management since all is outsourced to the service provider
  3. It auto-scales based on demand
  4. The focus is on the functionality, not the infrastructure

Why Serverless Goes Well With Graphql?

As we already mentioned GraphQL requires only a single entry endpoint to which HTTP clients connect. This means the endpoint should be reliable and fast at the same time.
AWS Lambdas are a pretty good fit for that functionality and combining it with API Gateway is just a fast and clean way to have the API up and running in no time.

The Simple Architecture We Created

Looking at the graphic we can see that the client has only a few steps needed to access the data it needs. API Gateway provides an endpoint /graphql to which the client connects. The API Gateway endpoint triggers a GraphQL lambda which in turn uses Graphene, a Python GraphQL framework, and SQL Alchemy, a Python ORM working with PostgreSQL in order to acquire the database information.

The Entry Point GraphQL Lambda

import graphene
from app.core.repository.dbhandler import DBHandler
from app.messages.api_response import GraphQLQueryBadRequest, GraphQLQueryResponseSuccess
from app.graphql.mutation import Mutation
from app.graphql.query import Query

def graphql(event, context):
# Initialize Schema
schema = graphene.Schema(query=Query, mutation=Mutation, types=query.types + mutation.types)

# Execute Schema
with DBHandler().session_scope() as session:
try:
result = schema.execute(event.action, context={'session': session})
return GraphQLQueryResponseSuccess(result.to_dict())
except Exception as e:
return GraphQLQueryBadRequest(result.to_dict())

Executing the Schema tells the Graphene framework that we have a list with available queries and mutations.

Queries are files that query for specific data using SQL Alchemy models to obtain the required data from the database. Mutations, on the other hand, are actions which either store or remove data from the database again using SQL Alchemy models.

When the incoming event action which is a plain GraphQL string is passed to the schema execute function accompanied by a database session, Graphene knows to search within the existing queries or mutations and to find the resolver function which would run and obtain the information.

Available Queries File


import graphene

from app.graphql.account.queries.accounts_query import AccountsQuery
from app.graphql.account.queries.account_properties_query import AccountPropertiesQuery

from app.graphql.account.schemas.account_properties_schema import AccountPropertiesSchema
from app.graphql.account.schemas.account_schema import AccountSchema

class Query(
AccountQuery,
AccountPropertiesQuery,
graphene.ObjectType
):
types = [
AccountSchema,
AccountPropertiesSchema
]

The query file is a simple file which aggregates all existing queries that our project has and that the schema would search through to find the requested query. We have made a file that combines all existing query files since our other alternative is having all queries to be written in a single query file, but instead, we have decided separate them in different files to have a cleaner codebase.

The Graphene Schema requires all the database models used to query on which the framework calls types and we have added them as part of the Query file since we have decided that in that way we have one clean list of all the queries and database models used to obtain information.

The Account Query File

import graphene
from app.graphql.account.schemas.account_schema import AccountSchema, ExtendedAccountSchema
from app.models import Account, Group
from app.core.domains.account import AccountDomain
class AccountQuery(object):
node = graphene.relay.Node.Field()
account = graphene.Field(lambda: ExtendedAccountSchema, id=graphene.Int())

def resolve_account(self, info, **kwargs):
id = kwargs.get('id')
found_account = AccountSchema.get_query(info) \
.add_entity(Group) \
.outerjoin(
Group,
Group.id == Account.group_id
) \
.filter(Account.id == id).first()

account, group = found_account if found_account else (None, None)

if not account:
return None

return ExtendedAccountSchema(
id=account.id,
name=account.name,
domain=account.domain,
status=account.status,
legacy_id=account.legacy_id,
group_id=account.group_id,
group_main_account_id=group.main_account_id if group else None,
group_main_account=group.main_account_id == account.id if group else None,
group_name=group.name if group else None,
username=account.username,
created=account.created,
updated=account.updated,
)

In this file, we perform a basic retrieval of an account by its id and we join the related Group entity. When we query for an account(id: 5) this would call the resolve_account method and would expect and id argument of type Int to be passed to the arguments.

The account property specifies that graphene would return an object of type ExtendedAccountSchema in which is a combination of both AccountSchema and GroupSchema in order to return joined result records.

The id argument id=graphene.Int() instructs Graphene to expect an id as a parameter passed together with the account query.

We have decided to join the accounts and groups in one query since it makes the DB querying a bit faster, nothing prevents you from having two separate queries – one which returns the account and one which returns the group.

Have in mind that this is the way GraphQL encourages frontend developers to implement requests, so they have a separation the way information is retrieved, however, if you need to have a list with combined results, our implementation is the way to go.

Querying With Result

So when we make a request we do a POST request to the https://example-endpoint.com/v1 endpoint with the following JSON body. We are basically instructing the Graphene Schema to look for the resolve_account method and retrieve the combined account and its group information:

{
"action": "{ 
account (id: 1337) { 
id, 
name, 
domain, 
legacyId, 
groupId, 
groupName, 
groupMainAccount, 
username, 
status,
created, 
updated 
} 
}"
}

The resulting JSON for the required account is:

{
"data": {
"account": {
"id": 1337,
"name": "Infinite Lambda",
"domain": "infinitelambda.com",
"legacyId": 9293,
"groupId": 5,
"groupName": "Infinite Lambda Corp.",
"groupMainAccount": 1337,
"username": "dev@infinitelambda.com",
"status": "COMPLETED",
"created": "2019-12-23T20:13:36.792447",
"updated": "2019-12-23T20:28:35.897277"
}
}
}

Executing the Lambda Using PostgreSQL on 6000+ Records

Running the lambda on our current production database results at similar speeds compared to when we run it on Dev DB with a smaller amount of records. Although we are doing a join on a record it still takes 3 times less than the acceptable 500ms for an AJAX request.

It is important to note that when doing filtering the DB columns which are used should be indexed to allow for best performance. In our case, the ID is a primary key so it is indexed by default. We have filter functions that are doing text-based searches so in that case, we either add a TSVector specific to PostgreSQL or an index to the desired column.

What We Achieved

We have found that the presented architecture best suits our application use case. Our clients had a rapidly growing application so we needed a way to quickly build a robust system where we spend more time on the application logic rather than infrastructural concerns such as making a separate API Gateway endpoint for each lambda.

If you have any further questions regarding the architecture and the way we have to build the product, or you are looking for a way to quickly develop your own scalable big data app with quickly accessible API feel free to contact us.

Share on facebook
Share on twitter
Share on linkedin