Written by The Jahia Team
   Estimated reading time:

Introduction

The profile import/export feature in Unomi is based on configurations and consumes or produces CSV files that contains the profiles to import / exported.

Import

Import UI

  For a functional description of the UI, please see the marketers guide

  Only ftp, sftp, ftps and file are supported in the source path.

eg. file:///tmp/?fileName=profiles.csv&move=.done&consumer.delay=25s

  Where fileName can be a pattern eg 'include=.*.csv' instead of 'fileName=...' to consume all CSV files,  by default the processed files are moved to '.camel' folder you can change it using the 'move' option.
  consumer.delay is the frequency of polling. 20000 (in milliseconds) means 20 seconds. Can be also '20s' other possible format are: 2h30m10s = 2 hours and 30 minutes and 10 seconds.

  See http://camel.apache.org/ftp.html  and http://camel.apache.org/file2.html   to build more complex source path.

Import API

Unomi provides REST Endpoints to manage import configurations:

  •   GET   /cxs/importConfiguration
  •   GET  /cxs/importConfiguration/{configId}
  •   POST /cxs/importConfiguration
  •   DELETE  /cxs/importConfiguration/{configId}

 This is how a oneshot import configuration looks like:

{
    "itemId": "importConfigId",
    "itemType": "importConfig",
    "name": "Import Config Sample",
    "description": "Sample description",
    "configType": "oneshot",        //Config type can be 'oneshot' or 'recurrent'
    "properties": {
      "mapping": {
        "email": 0,                 //<Unomi Property Id> : <Column Index In the CSV>  
        "firstName": 2,
        ...
      }
    },
    "columnSeparator": ",",         //Character used to separate columns 
    "lineSeparator": "\\n",         //Character used to separate lines (\n or \r) 
    "multiValueSeparator": ";",     //Character used to separate values for multivalued columns 
    "multiValueDelimiter": "[]",    //Character used to wrap values for multivalued columns
    "status": "SUCCESS",            //Status of last execution
    "executions": [                 //(RETURN) Last executions by default only last 5 are returned
      ...
    ],
    "mergingProperty": "email",         //Unomi Property Id used to check duplicates
    "overwriteExistingProfiles": true,  //Overwrite profiles that have duplicates
    "propertiesToOverwrite": "firstName, lastName, ...",      //If last is set to true, which property to overwrite, 'null' means overwrite all
    "hasHeader": true,                  //CSV file to import contains a header line
    "hasDeleteColumn": false            //CSV file to import doesn't contain a TO DELETE column (if it contains, will be the last column)
}

A recurrent import configuration is similar to the previous one with some specific information to add to the JSON like:

  {
    ...
    "configType": "recurrent",
    "properties": {
      "source": "ftp://USER@SERVER[:PORT]/PATH?password=xxx&fileName=profiles.csv&move=.done&consumer.delay=20000",
                // Only 'ftp', 'sftp', 'ftps' and 'file' are supported in the 'source' path
                // eg. file:///tmp/?fileName=profiles.csv&move=.done&consumer.delay=25s
                // 'fileName' can be a pattern eg 'include=.*.csv' instead of 'fileName=...' to consume all CSV files 
                // By default the processed files are moved to '.camel' folder you can change it using the 'move' option
                // 'consumer.delay' is the frequency of polling. '20000' (in milliseconds) means 20 seconds. Can be also '20s'
                // Other possible format are: '2h30m10s' = 2 hours and 30 minutes and 10 seconds 
      "mapping": {
        ...
      }
    },
    ...
    "active": true,  //If true the polling will start according to the 'source' configured above
    ...
  }

 

Exports

UI

 For a functional description of the UI, please see the marketers guide

 Only ftp, sftp, ftps and file are supported in the source path,  (eg. file:///tmp/?fileName=profiles-export-${date:now:yyyyMMddHHmm}.csv&fileExist=Append) 

 As you ca notice in the example above we can inject variables in the produced file name ${date:now:yyyyMMddHHmm} is the current date formatted with the pattern yyyyMMddHHmm. fileExist option put as Append will tell the file writer to append to the same file for each execution of the  export configuration. You cam omit this oprtion to write a profile per file.

See http://camel.apache.org/ftp.html  and http://camel.apache.org/file2.html   to build more complex destination path.

Export API

Unomi provides REST Endpoints to manage export configurations:

  •   GET   /cxs/exportConfiguration
  •   GET  /cxs/exportConfiguration/{configId}
  •   POST /cxs/exportConfiguration
  •   DELETE  /cxs/exportConfiguration/{configId}

This is how a oneshot export configuration looks like:

{
    "itemId": "exportConfigId",
    "itemType": "exportConfig",
    "name": "Export configuration sample",
    "description": "Sample description",
    "configType": "oneshot",
    "properties": {
      "period": "2m30s",
      "segment": "contacts",
      "mapping": {
        "0": "firstName",
        "1": "lastName",
        ...
      }
    },
    "columnSeparator": ",",
    "lineSeparator": "\\n",
    "multiValueSeparator": ";",
    "multiValueDelimiter": "[]",
    "status": "RUNNING",
    "executions": [
      ...
    ]
}

A recurrent export configuration is similar to the previous one with some specific information to add to the JSON like:

{
    ...
    "configType": "recurrent",
    "properties": {
      "destination": "sftp://USER@SERVER:PORT/PATH?password=XXX&fileName=profiles-export-${date:now:yyyyMMddHHmm}.csv&fileExist=Append",
      "period": "2m30s",      //Same as 'consumer.delay' option in the import source path
      "segment": "contacts",  //Segment ID to use to collect profiles to export
      "mapping": {
        ...
      }
    },
    ...
    "active": true,           //If true the configuration will start polling upon save until the user deactivate it
    ...
  }

 

Configuration in details 

First configuration you need to change would be the configuration type of your import / export feature (code name 'router') in the etc/unomi.custom.system.properties file (creating it if necessary):

#Configuration Type values {'nobroker', 'kafka'}
org.apache.unomi.router.config.type=nobroker

By default the feature is configured (as above) to use no external broker,  which means to handle import/export data it will use in memory queues (In the same JVM as Unomi). If you are clustering Unomi, most important thing to know about this type of configuration is that each Unomi will handle the import/export task by itself without the help of other nodes (No Load-Distribution).

Changing this property to kafka means you have to provide the Apache Kafka configuration, and in the opposite of the nobroker option import/export data will be handled using an external broker (Apache Kafka), this will lighten the burden on the Unomi machines.

You may use several Apache Kafka instance, 1 per N Unomi nodes for better application scaling.

To enable using Apache Kafka you need to configure the feature as follows:

#Configuration Type values {'nobroker', 'kafka'}
org.apache.unomi.router.config.type=kafka

#Uncomment and update Kafka settings to use Kafka as a broker

#Kafka
org.apache.unomi.router.kafka.host=localhost
org.apache.unomi.router.kafka.port=9092
org.apache.unomi.router.kafka.import.topic=import-deposit
org.apache.unomi.router.kafka.export.topic=export-deposit
org.apache.unomi.router.kafka.import.groupId=unomi-import-group
org.apache.unomi.router.kafka.export.groupId=unomi-import-group
org.apache.unomi.router.kafka.consumerCount=10
org.apache.unomi.router.kafka.autoCommit=true

There is couple of properties you may want to change to fit your needs, one of the is the import.oneshot.uploadDir which will tell Unomi where to store temporarily the CSV files to import in Oneshot mode, it's a technical property to allow the choose of the convenient disk space where to store files to import. It defaults to the following path under the Unomi Karaf (It is recommended to change the path to a more convenient one).

#Import One Shot upload directory
org.apache.unomi.router.import.oneshot.uploadDir=${karaf.data}/tmp/unomi_oneshot_import_configs/

Next two properties are max sizes for executions history and error reports, for some reason you dont want Unomi to report all the executions history and error reports generated by the executions of an import/export configuration. To change this you have to change the default values of these properties.

#Import/Export executions history size
org.apache.unomi.router.executionsHistory.size=5

#errors report size
org.apache.unomi.router.executions.error.report.size=200

Final one is about the allowed endpoints you can use when building the source or destionation path, as mentioned above we can have a path of type file, ftp, ftps, sftp. You can make it less if you want to omit some endpoints (eg. you don't want to permit the use of non secure FTP).

#Allowed source endpoints
org.apache.unomi.router.config.allowedEndpoints=file,ftp,sftp,ftps