Skip to main content

Importer

This package allows you to import JSON data from URLs or local files, and to synchronize them over time.

Features

  • Import JSON data from URLs or local files
  • Synchronize data regularly using a simple cron-job configuration
  • Automatically send Create, Update and Delete activities from a given ActivityPub actor

Dependencies

Install

$ yarn add @semapps/importer

Usage

Pre-configured importers

We provide a number of pre-configured importers, some of which work "out of the box" with very little configurations.

If you create a custom importer for a well-known software, feel free to open a PR.

Queue service

If you wish the synchronize action to be run through cron jobs, and keep an eye on its results, you should add moleculer-bull QueueMixin.

const QueueMixin = require('moleculer-bull');

module.exports = {
name: 'my-importer',
mixins: [DrupalImporterMixin, QueueMixin(CONFIG.QUEUE_SERVICE_URL)],
...
}

Settings

If you use a pre-configured importer, you probably won't have to bother about most settings in the source property.

const { ImporterMixin } = require('@semapps/importer');

module.exports = {
mixins: [ImporterMixin],
settings: {
source: {
apiUrl: null, // Base URL of the API. Will be used to find existing data on synchronizations
getAllFull: null, // API endpoint to get all data in a complete form, or path to a local file
getAllCompact: null, // API endpoint to get all data in a compact form (id + updated date)
getOneFull: null, // Function which takes the data of an item and return its source URL of the source URI
headers: {}, // Headers to pass to all fetch requests
basicAuth: {
user: '',
password: ''
},
fetchOptions: {}, // Additional options to pass to all fetch requests
fieldsMapping: {
slug: null, // Property used for the slug, or a function which receives data as a parameter and returns the slug
created: null, // Property used for the creation date, or a function which receives data as a parameter and returns the slug
updated: null, // Property used for the modified date, or a function which receives data as a parameter and returns the slug
},
},
dest: {
containerUri: null, // Container where the data will be posted (must be created already)
predicatesToKeep: [], // Don't remove these predicates when updating data
},
activitypub: {
actorUri: null, // ActivityPub actor who will post activities on synchronization (leave null to disable)
activities: ['Create', 'Update', 'Delete'] // The activities you want to be posted by the actor
},
cronJob: {
time: null, // For example '0 0 4 * * *' for every night at 4am
timeZone: 'Europe/Paris'
}
},
methods: {
transform(data) {
return({
...data
});
}
}
};

Actions

freshImport

Delete all imported data and re-import them from the source.

Parameters
PropertyTypeDefaultDescription
clearBooleantrueClear existing objects before reimporting

synchronize

Fetch the data from the source, compare them with the existing data and create/update/delete what is necessary.

If the activitypub.actorUri setting is defined, it will post Create, Update and Delete activities.

If the cronJob.time setting is defined, this action will be called automatically.

deleteImported

Delete all imported data. Called at the start of the freshImport action.

list

Return results of the list method (see below). Useful if you want to fetch the API without importing data.

PropertyTypeDefaultDescription
urlStringsource.getAllFull settingURL that you want to fetch

getOne

Return a single data through the getOne method (see below). Useful if you want to fetch the API without importing data.

PropertyTypeDefaultDescription
dataObjectrequiredObject that will be passed to source.getOneFull to find the URL to fetch

Methods

transform

Called for each item. Receives the raw (non-semantic) data as an object and must return another object with semantic data (including the type).

list

If you use a pre-configured importer, you don't need to worry about this.

Receives an URL (for remote endpoints) or a path (for local files) and must return the data as an array of items.

By default, it calls the fetch method (see below), but you may want to overwrite it depending on the shape of the API data.

getOne

If you use a pre-configured importer, you don't need to worry about this.

Receives the URL of a remote resource and must return the data.

By default, it calls the fetch method (see below), but you may want to overwrite it depending on the shape of the API data.

fetch

If you use a pre-configured importer, you don't need to worry about this.

This method accepts a single argument which can be either:

  • An URL (for remote endpoints)
  • A path (for local files)
  • An object with an url property and other arguments understood by fetch

If the argument is an URL, the remote endpoint is fetched (using the source.headers and source.fetchOptions settings) and the JSON-parsed result is returned.

If the argument is a path, the local file is read and the JSON-parsed result is returned.

prepare

Called before a fresh import or a synchronization.