Schema Design

The subgraph schema is defined using GraphQL language - a query language for APIs and a runtime that fulfils queries with data.

The entities are defined in a .graphql file (default: schema.graphql)

A simple entity definition will look something like this:

type Foo @entity {
	id: ID!
	bar: String!
	baz: BigInt
  ...
}

By default every entity is mutable, which means it can be loaded and modified in the handlers after the entity has been created. This is fine if we need to update it over time, but also may impact query time. For entities that is known that they will never be updated it is recommended to mark them as immutable

type Foo @entity(immutable: true) {
	id: ID!
	bar: String!
	baz: BigInt
  ...
}

Immutable entities are much faster to write and to query, and should therefore be used whenever possible.

Entities fields can be defined as required or optional. Required fields are indicated by adding ! after the field type in the schema declaration, for example the bar field in our previous example.

If a required field is not set in the mapping, you will receive and error, when trying to access it.

Some important things to note:

Every entity is required to have an id field.
id fields can be of types ID!, which is a synonym of String!, or Bytes!
Generally it is recommended to use Bytes!, unless the id needs to contains a human readable text. Entities with Bytes! ids are faster to read and write.
id fields serve as primary keys and need to be unique among entities of the same type.

The schema supports the following built types:

ID - translates to String
String
Bytes
Boolean
Int - everything bellow uint32
BigInt - unit32 and above
BigDecimal

You can also create enums:

enum MyEnum {
	Foo
	Bar
	Baz
}

You can use the string representiation to assing the Enum variant in the mappings code

type Entity @entity {
	id: Bytes!
	enumField: MyEnum
}

...
entity.enumField = "Foo";
...

The schema supports several types of entity relationships:

One-To-One
One-To-Many
Reverse Lookups with @derivedFrom fields
Many-To-Many
Using Mapping Tables (no direct url)

Detailed explanation about every type of relationship can be found in TheGraph documentation here

Note: Recently TheGraph team added a functionality to load the derived entities in the mappings, something that was no possible before, because derived fields are virtual fields, that were only accessible only at query time. Now these entities can be loaded into the event handlers using the loadRelated method.

Best Practices

Carefully plan your entities, avoid mapping your events or function calls to your entities 1:1
Mark your entities as immutable whenever possible.
Use Bytes for the entity id field.
Carefully choose your entity ids , they should be unique between entities of the same type and easy to construct or get from the available data when loading the entity in the handlers or querying the subgraph.
Avoid keeping large arrays of data as entity fields. Large arrays will greatly reduce your subgraph performance, because every time you want to update the entity, we need to read that array from the and write it back to the database.
Use reverse lookups(@derivedFrom) whenever possible, instead of using direct relationships between entities. This will reduce the amount of data fetched from the database when loading entities in the handlers. Also the new loadRelated method allows you to fetch derived entities on demand, only when needed.

Some great materials about improving your schema performance:

Best Practices in Subgraph Development: Avoiding Large Arrays

Two Simple Subgraph Performance Improvements

Handlers Subgraph Manifest