Importer
This package allows you to import JSON data from URLs or local files, and to synchronize them over time.
Features
- Import JSON data from URLs or local files
- Synchronize data regularly using a simple cron-job configuration
- Automatically send
Create,UpdateandDeleteactivities from a given ActivityPub actor
Dependencies
Install
$ yarn add @semapps/importer
Usage
Pre-configured importers
We provide a number of pre-configured importers, some of which work "out of the box" with very little configurations.
If you create a custom importer for a well-known software, feel free to open a PR.
Queue service
If you wish the synchronize action to be run through cron jobs, and keep an eye on its results, you should add moleculer-bull QueueMixin.
const QueueMixin = require('moleculer-bull');
module.exports = {
name: 'my-importer',
mixins: [DrupalImporterMixin, QueueMixin(CONFIG.QUEUE_SERVICE_URL)],
...
}
Settings
If you use a pre-configured importer, you probably won't have to bother about most settings in the
sourceproperty.
const { ImporterMixin } = require('@semapps/importer');
module.exports = {
mixins: [ImporterMixin],
settings: {
source: {
apiUrl: null, // Base URL of the API. Will be used to find existing data on synchronizations
getAllFull: null, // API endpoint to get all data in a complete form, or path to a local file
getAllCompact: null, // API endpoint to get all data in a compact form (id + updated date)
getOneFull: null, // Function which takes the data of an item and return its source URL of the source URI
headers: {}, // Headers to pass to all fetch requests
basicAuth: {
user: '',
password: ''
},
fetchOptions: {}, // Additional options to pass to all fetch requests
fieldsMapping: {
slug: null, // Property used for the slug, or a function which receives data as a parameter and returns the slug
created: null, // Property used for the creation date, or a function which receives data as a parameter and returns the slug
updated: null, // Property used for the modified date, or a function which receives data as a parameter and returns the slug
},
},
dest: {
containerUri: null, // Container where the data will be posted (must be created already)
predicatesToKeep: [], // Don't remove these predicates when updating data
},
activitypub: {
actorUri: null, // ActivityPub actor who will post activities on synchronization (leave null to disable)
activities: ['Create', 'Update', 'Delete'] // The activities you want to be posted by the actor
},
cronJob: {
time: null, // For example '0 0 4 * * *' for every night at 4am
timeZone: 'Europe/Paris'
}
},
methods: {
transform(data) {
return({
...data
});
}
}
};
Actions
freshImport
Delete all imported data and re-import them from the source.
Parameters
| Property | Type | Default | Description |
|---|---|---|---|
clear | Boolean | true | Clear existing objects before reimporting |
synchronize
Fetch the data from the source, compare them with the existing data and create/update/delete what is necessary.
If the activitypub.actorUri setting is defined, it will post Create, Update and Delete activities.
If the cronJob.time setting is defined, this action will be called automatically.
deleteImported
Delete all imported data. Called at the start of the freshImport action.
list
Return results of the list method (see below). Useful if you want to fetch the API without importing data.
| Property | Type | Default | Description |
|---|---|---|---|
url | String | source.getAllFull setting | URL that you want to fetch |
getOne
Return a single data through the getOne method (see below). Useful if you want to fetch the API without importing data.
| Property | Type | Default | Description |
|---|---|---|---|
data | Object | required | Object that will be passed to source.getOneFull to find the URL to fetch |
Methods
transform
Called for each item. Receives the raw (non-semantic) data as an object and must return another object with semantic data (including the type).
list
If you use a pre-configured importer, you don't need to worry about this.
Receives an URL (for remote endpoints) or a path (for local files) and must return the data as an array of items.
By default, it calls the fetch method (see below), but you may want to overwrite it depending on the shape of the API data.
getOne
If you use a pre-configured importer, you don't need to worry about this.
Receives the URL of a remote resource and must return the data.
By default, it calls the fetch method (see below), but you may want to overwrite it depending on the shape of the API data.
fetch
If you use a pre-configured importer, you don't need to worry about this.
This method accepts a single argument which can be either:
- An URL (for remote endpoints)
- A path (for local files)
- An object with an
urlproperty and other arguments understood by fetch
If the argument is an URL, the remote endpoint is fetched (using the source.headers and source.fetchOptions settings) and the JSON-parsed result is returned.
If the argument is a path, the local file is read and the JSON-parsed result is returned.
prepare
Called before a fresh import or a synchronization.