Working with data feeds

A data feed is the transfer of data via one or more files, from one system to another. Data is not real-time, but is sent on a regular basis, such as every weekday morning. In this way a data file feed is different to an API, which supplies specific pieces of data between systems ‘on the fly’ or on request.

This post outlines the basics involved in integrating a third party data feed into an application, for business analysts and project managers.

There are five basic components to taking in a data feed:

People involved will be:

Project sponsor and contract signer
Network or connectivity specialist to set up the connection for data to be received
the data vendor
Data developer – sometimes with assistance from a technical business analyst
Product manager/application technical lead
QA tester
User Acceptance testers
Data support team
Application support team
Marketing, client-facing teams to get the good news out to your customers when the data is live in your application

Some things to note about integrating data feeds:

Data feed integration is specialised work that requires a data developer. It also should include the oversight of a technical specialist in your application who understands very well the behaviour of data and processes in the application, if you want to avoid integrating a feed that the application is unable to make full use of.

Quite likely, your application will need a new release to support the data integration. That could be just a ‘server-side’ release (database and other background stuff) or a server-side and ‘client-side’ release (the application that your users use).

Testing needs to be done multiple times, in multiple environments – including in production.

So, let’s get started!

1: Commercials

In other words, securing agreement with the data feed vendor. The agreement needs to include support levels, including what happens if there is a problem with the feed and remediation for late or missing data.

Tip: particularly if your organisation is new to integrating and managing third party data: choose standard feeds and agreements where possible, and avoid bespoke data and agreements unless you really, really need them.

2: Connectivity

The two sides – data vendor and your organisation – agree on method of connectivity, such as SFTP, and then set this up and test it. This will need to be done initially between test environments and then again later any time you are implementing the feed to another environment – staging or pre-production if you have it, and again for production (go-live). The environments used will depend on what your organisation and your vendor has available, your test requirements, and time and cost trade-offs.

3: Data mapping and integration

By far the biggest part of the work, and the most time consuming.

If your application already has a mature data structure then the work is to map the data feed into that structure. If your organisation is building or extending an application, then the application’s data structure needs to be built or extended as well.

Each data feed needs to be taken in and stored, essentially in a database that the data development team sets up.

This means there will be at least two data stores involved: one for the raw data coming in from the feed, and one that sits behind your organisation’s application, that mediates data to the application’s interface (screen).

The high-level steps to map a data feed are:

Understand in what format the data feed will be provided. Is it a single file, or – more likely in the case of complex data such as financial accounts and transactions – multiple files? How is the data linked between files?

Get the data feed documentation and if possible some data feed samples from the vendor. Know that data feed samples are likely mock-ups and will only provide a limited sample of all possible data values.

Don’t assume that data documentation is 100% complete or 100% accurate. Documentation is written in plain English (even if it looks “technical”), and the descriptions of data value meanings are just someone’s attempt to describe reality. Also, vendors don’t always update the documentation for every minor update. Some documentation is intentionally light and by design does not include every detail. For all these reasons it is common for the data team to have quite a lot of back and forth with the vendor as they confirm and iron out details in the data mapping.

The data team should document the mappings and how data is going to be used in the application.

Testing. Lots of testing.

4: Testing

The data team tests the data feed in a local environment, and/or a shared test environment, both during and after the mapping and integration phase. A QA tester or test lead should run independent tests after that and be able to evidence the results.

Gaps or problems with mappings or data behaviour are addressed. Testing is re-run.

When local and QA testing has passed, the data feed might be connected to a staging or pre-production environment, to allow other users to test the data in the application’s test environment. This could be customer users, but is more likely to be acceptance testing by people within your organisation. This should include the business analyst and a ‘business ambassador’ – such as an operations or customer-facing person who knows the processes that will use the new data and has been involved with the project, so knows what to expect and what to test.

Testing should cover the full range of data (all data feed files, data elements and expected data values) as far as possible. In theory, all possibilities can be tested in a test environment. In practice, the vendor might not have test data available covering every scenario you want to test. The best approach is to agree test requirements internally and discuss what is possible with the vendor, and what can be done with local or mocked-up data where necessary to fill the gaps, to the extent that this is worth doing. (It won’t give much benefit if you are just working on assumptions and can’t be sure that what you are mocking up is what you will get from live data).

5: Go-live, support and monitoring

Hooray! Testing has passed, you all thoroughly understand the data feed and all possible data you can expect to receive (or do you?!) So now it’s time to go live.

Here are the steps:

Complete the internal technical documentation

The technical team prepares a delivery plan

Write release notes if applicable – assuming your application is doing a server-side (data/background stuff) release and maybe a client-side (application that users use) release to support the data integration

Write up knowledge base articles, both for internal support of the data feed, and for customers and internal application users

Who will be supporting the data feed? That means monitoring it daily, springing to action if the feed is late or doesn’t work one day, and responding to queries or issues raised by others even when everything on the monitoring dashboards looks fine. What hours are you realistically doing this, and who are the people responsible? Does the support team use a roster?

Once the feed is live, the data might still need to be monitored by the data team or technical business analyst for a time, if the test cycle or test data available did not cover all expected data values. This is common for example with financial transactions data, or data for financial instrument life-cycle events that can take weeks or months to eventuate.

TLDR: Tips for working with data feeds

Get a good contract with a Support Level Agreement that addresses late or problem data
Get standard rather than bespoke data feeds where available
Understand in what format the data feed will be provided. Is it a single file or multiple files? If there are multiple files, what are the key data that link them?
Don’t assume that data documentation is 100% complete or 100% accurate.
Allow plenty of time for the mapping and integration phase. A specialist data integration team I worked with always allowed 6-8 weeks for even a ‘simple’ data feed, and they were running an experienced data integration operation into a mature, well structured and well-documented data system. For a more complicated data feed, that time would at least double.
Have a clear test plan that means that everyone understands what must be tested, what the limits of the test data and/or test environments are, and what testing and monitoring will be done in production after deployment.
Have a deployment and release plan that includes release notes, knowledge base articles, and support. Know who will be monitoring and supporting the feed internally, and what that involves.
Allow time for monitoring the data once it is live, for whatever time represents at least one life-cycle of your process – regardless of how long the feed was tested.