Development Guides

Large Data API Pulls – Best Practices

Purpose: While the API is most commonly used to get simple lists of data or all the details of an individual record, there can be use cases for extracting large amounts of data for all records for a given object. Making standard call like GET /portfolio/assets, gigabytes of data, containing millions of assets, and taking several minutes to return — assuming it doesn’t timeout!  A tough ask for any REST API.  Luckily, all our core endpoints have several options for filtering the data. 

Simple Use Case: A Customer wants to call the API daily for all assets in their entire database so that they can create a report within their own business intelligence platform of all positions held over time. They do not need all data returned from the specific endpoints and will filter down to the records and fields required.

Scope and Outputs: This article will discuss extracting assets, but the concepts can be applied to several other key data points as well.  One thing to point out though, is that these methods are focused around pulling stored data – not data that is calculated on the fly (such as cost basis, performance, asset allocation groupings, etc.).

Process Overview:

  1. Filter down to only the records you need.
  2. Request only the fields you need.
  3. Split the data into chunks.

Process Steps:

  1. With assets, the minimum filter that we require is ?hasValue=true.  This will filter down the results to only those assets with value.  You may also include things like asOfDate, editedStartDate, and/or editedEndDate.   You could also pass in a clientId, registrationId, or accountId, if you’re only needing the assets for a handful or less of these entities.  Many of these endpoints also support oData queries as well, by using the $filter parameter, which you can read about here.  So, at a minimum, after this step, your call should look like this:

GET /portfolio/assets?hasValue=true

  1. Now that we’ve filtered down the length of the data, we’ll focus on narrowing the width of the data.  An asset object, when returned from the API in its default form, contains over 50 fields!  It’s likely that you won’t need all these fields to support the functionality you’re trying to implement, so we recommend exploring our documentation or making some sample API calls to see what is currently being returned and creating a list of the fields you do need.  Once you have this list, it’s as simple as passing them in a comma-delimited string to the $select query string parameter, like so:

GET /portfolio/assets?hasValue=true&$select=Id,AccountId,Ticker,CurrentValue

The $select parameter also comes from the oData spec, which you can read about here.  One important thing to note is that when using this feature, the field names are case-sensitive and do not have the first letter lowercased like you would see in the response objects when making the call without the $select parameter (ex. accountId becomes AccountId, currentValue becomes CurrentValue, etc).

When a data output has a related dataset, you can further specify attributes within that dataset by expanding the results and modifying your select parameter as such:

GET /Trading/ModelAggs?$expand=Details&$select=Id,Details/ModelId,Details/ModelName

  1. If you’re regularly pulling down over 500k records, it’s time to start chunking or paging through the data. We do not recommend using the oData implementation with $skip and $top parameters. The most performant approach we’ve found for this is to start with a request for just the asset IDs, using the $select option that was discussed in Step 2, like so:

GET /portfolio/assets?hasValue=true&$select=Id

Now that you have a full list of all the asset IDs, you can split it into chunks and make requests for the full asset objects for each chunk. You’ll need to experiment with chunk sizes to find what performs best. If you’re requesting a lot of fields, use smaller chunks – just a few fields, larger chunks. For a handful of fields a good baseline is chunks of 50,000. The request for each chunk will look something like this:

POST /portfolio/assets/list/id?$select=Id,AccountId,Ticker,CurrentShares,CurrentValue

BODY [1, 2, 3, etc]  <- array containing a chunk of asset IDs

  1. Make the chunk requests in parallel, You may optionally make the requests for each chunk in parallel, but again it’s worth experimenting to find the best performing number of parallel requests. It will likely vary depending on how many fields you’re requesting, and how many total assets will be retrieved. A good baseline to start with is 4 parallel requests.

Process Tips or Controls:

  1. While Orion current supports $top and $skip functionality, it is formally not Orion’s best practice. This is native functionality to the REST API structure, but because it is not formally implemented into the endpoints themselves, not every endpoint will support this. As such, it is not a reliable method for filtering and should be avoided when possible. Orion may need to retire or suppress this functionality at some point for the health of the API and Database performance.
  2. On some endpoint Orion enforces a limit which the endpoint will response with when exceeded. For example if the limit is 100,000 and you use $top=150000 – the response will be “The query specified in the URI is not valid. The limit of ‘100000’ for Top query has been exceeded.”