Usage Recommendations

This section is to help API client implementers gain better understanding of this API’s intended use and best practices.

Staleness

Hash Sharing is a polling API. While using this API, entries encountered always represent their most recent state. Previous, or stale, data is never presented. Every time an entry is created, updated, or deleted so is its associated timestamp. As a result, every entry will only display their most recent update timestamp, which is used to resolve entry queries.

Using Date Ranges to Query

Although the API always returns the latest sate of entries, it is not designed for real-time or interactive use. Clients should avoid querying the ful dataset to locate specific entries. Instead, they should track the last time they queried and use that timestamp - along with the current time - to request only new and updated entries. This API is meant to support periodic polling to help clients build and maintain their own datasets, not serve as a live datastore.

What Types of Data Should Clients Store

Clients should keep track of the last time they queried for entries. This timestamp and the current time should be used to query for all changes since the previous query.

Client implementers should keep track of entries they are interested in within their own systems. Qualifiers for what you’re interested could include:

  • Entires with specific fingerprint types. e.g. MD5 or PDNA

  • Who submitted entry, e.g. NCMEC’s Take It Down user on the NPO CSAM Hash List

  • Certain categorizations found on entries

Frequency

The frequency at which a client can poll the API is up to the client. Some reasonable polling intervals include once per day, every few hours, or every 10 minutes. This API should only be queried at the frequency necessary for the client’s needs. This is to reduce strain on the system and other Hash Sharing API users.

Ranges

As part of its regular API usage, a Hash Sharing client should never perform a query over a previously searched time ranges. Situations where a one-off full dataset query may be helpful include:

  • Troubleshooting

  • An initial backfill your dataset

  • Backfilling your dataset after user(s) granted/revoked access to their entries

  • Backfilling your dataset after the client implements a new feature present in historic entries. e.g. new fingerprint types or categorizations

Missing Entries

If an entry is not in range where it was previously, then either:

  • It has been updated or deleted since and thus would be found in a more recent date range

  • The owner of that entry has changed their permission settings, revoking your visibility for all their entries

    • This API offers no explicit detection mechanism for others changing your visibility on their entries

    • Similarly, if a user were to grant you permissions to their entries, you would need to requery all historical data