Integrating external data sources
Introduction
External Data Provider is a module which provides a new API, allowing the integration of external system as content providers like the JCR.
Integration is done by implementing an External Data Source. The data source just need to focus in the connection to the external system and the data retrieval, when the external data provider does all the mapping work which will make the external content appear in the JCR tree.
All data source must provide content (reading). They can provide search capabilities, provide write access to create/update content. They can also be enhanceable - meaning that the ‘raw’ content they provide can be enhanced by Jahia content (such as for example being able to add comments to an object provided by an External Data Provider).
How it works
Specify your mapping
Your external content has to be mapped as nodes inside Jahia so they can be used by Jahia as regular nodes (edit
/copy
/paste
/reference
etc.). This means that your external provider module must provide a definition cnd file for each type of content you plan to map into Jahia.
As a simple example, you can map a database table to a nodetype, defining each column as a JCR property:
Then, you have to define a tree structure for your objects. As they will appear in the Jahia repository, you’ll have to decide for each entry what will be its parent and children.
It is very important that each node have a unique path - you must be able to find an object from a path, and also to know the path from an object. The node returned by a path must always be the same, and not depend on contextual information. If your nodes depend on the context (for example, the current user), you’ll need to have different paths. In order to correctly create a node hierarchy, it’s perfectly allowed to add some “virtual nodes” which will act container to organize your data.
Optionally, you can define a unique identifier for every node. The External Data Provider will map this identifier to a JCR compatible UUID if needed, so that it can be used in Jahia as any other node.
Declaring your data source
Those external data are accessed through a JCR provider declared in Spring, where you will set some information like provider key, the mount point, the data source implementation, ...
<bean id="TMDBProvider" class="org.jahia.modules.external.ExternalContentStoreProvider" parent="AbstractJCRStoreProvider"> <property name="key" value="TMDBProvider"/> <property name="mountPoint" value="/sites/movies/contents/tmdb"/> <property name="externalProviderInitializerService" ref="ExternalProviderInitializerService"/> <property name="extendableTypes"> <list> <value>nt:base</value> </list> </property> <property name="dataSource" ref="TMDBDataSource"/> </bean> <bean name="TMDBDataSource" class="org.jahia.modules.tmdbprovider.TMDBDataSource" init-method="start"> <property name="cacheProvider" ref="ehCacheProvider"/> <property name="apiKeyValue" value="${com.jahia.tmdb.apiKeyValue}"/> </bean>
This provider then access the underlying data source (implementing ExternalDataSource and other optional interface if needed to read,save the data).
Your implementation of ExternalDataSource must also list the node types you are handling so that Jahia knows which node types this data source is able to handle. This can be done programmatically or inside your spring file, here an example of declarative nodeType support from the ModuleDataSource
.
<bean id="ModulesDataSourcePrototype" class="org.jahia.modules.external.modules.ModulesDataSource" scope="prototype"> <property name="supportedNodeTypes"> <set> <value>jnt:cssFolder</value> <value>jnt:cssFile</value> <value>jnt:javascriptFolder</value> <value>jnt:javascriptFile</value> </set> </property> </bean>
Implementation
Providing and reading content
The main point to define a new provider is to implement the ExternalDataSource
interface provided by the external-provider module (org.jahia.modules.external.ExternalDataSource
).
This interface requires from you to implement 7 methods to be able to mount/browse your data as if they were part of the Jahia Content tree.
Here the listing of those methods:
getItemByPath
getChildren
getItemByIdentifier
itemExists
getSupportedNodeTypes
isSupportsUuid
isSupportsHierarchicalIdentifiers
The first method, getItemByPath
(), is the entry point for the external data. It has to return an ExternalData node for all valid paths - including the root path (/
). ExternalData is a simple java object that represent an entry of your external data. It contains the id
, path
, node types
and the properties
encoded as string (or Binary objects for binaries properties).
The getChildren
method also need to be implemented for all valid paths - it has to return the names of all sub nodes, as you want them to appear in the Jahia repository. For example, if you map a table or the result of a SQL query then this is the method that will return all the results. Note that it is not required that all valid nodes are listed here. If they don’t appear here, you won’t see them in the repository tree, but you may still be able to access them directly by path or by doing a search. This is especially useful if you have thousands of nodes at the same level.
These two methods reflect the hierarchy you will give to Jahia.
The getItemByIdentifier()
method return the same ExternalData node, but based on the internal identifier you want to use.
The getSupportedNodeTypes()
method simply return the list of node types that your data source may contains.
isSupportsUuid()
tells the External Data Provider that your external data have identifier in the UUID format. This prevent Jahia to create its own identifiers and maintain a mapping between its uuids and your identifiers. In most of the cases, return false.
isSupportsHierarchicalIdentifiers()
tells if your identifier actually looks like the path of the node, and allows the provider to optimize some operation like the move - where your identifier will be “updated”. This is for example useful in a file system provider, if you want to use the path of the file as its identifier.
itemExists()
simply tests if the item at the give path exists.
Identifier Mapping
Every time we read an external node for the first time we generate a unique identifier for it inside Jahia. Those mapped identifiers are stored inside a table called jahia_external_mapping
.
This table map the internal id to a pair of provider key and the external id returned by ExternalData.getIdentifier
method.
ExternalData
The External Data Source is responsible for mapping its data content into ExternalData
object. ExternalData
provides access to the properties of your content, those properties have to be converted to one of two type String or Binary. String can be internationalized or not, as they are declared in the cnd file.
Lazy Loading
If your External provider is accessing expansive data (performance/memory wise) to read then you can implement the ExternalDataSource.LazyProperty interface and fill the lazyProperties, lazyI18nProperties and lazyBinaryProperties sets inside ExternalData. If somebody tries to get a property which is not the properties map in ExternalData, but which is in one of those sets, the system will call one of these methods to get the values:
getBinaryPropertyValues
getI18nPropertyValues
getPropertyValues
For example, the ModuleDataSource
retrieve the source code as a LazyProperties
so this way the source code will be read from the disk only when displayed not when you display the file inside the tree for exploration.
You have to decide which type of loading you want to implement, for example on a DB it must be more interesting to read all the data at once (if not binaries ) depending on the number of rows and columns.
Searching Content
Basic implementation
This capability will require you to implement ExternalDataSource.Searchable
interface which define only one method:
search
(ExternalQuery query)
Where query is an ExternalQuery, more information here:
Your method should be able to handle a list of constraint from the query (AND
, OR
, NOT
, DESCENDANTNODE
, CHILDNODE
, etc.)
You do not have to handle everything if it does not make sense in your case.
The QueryHelper
class provide some helpful methods to parse the constraints:
getNodeType
getRootPath
includeSubChild
getSimpleAndConstraints
getSimpleOrConstraints
The getSimpleAndConstraints
method will return you a map of the properties and their expected values from the AND
constraints in the query.
The getSimpleOrConstraints
method will return you a map of the properties and their expected values from the OR
constraints in the query
From the constraints you build a query that means something for your external provider (for example if it is an SQL DB, map those constraints as ‘AND
’ constraint in the WHERE
clause of your request).
Query are expressed using the JCR SQL-2 language definition.
Offset and Limit support
The external data queries support offset and limit query parameters.
In case of multiple providers, the results are returned querying each provider in no specific order, but it will always use the same order after the provider being mounted.
This mean that on a same query, limit and offset can be used to paginate the results.
Count
You can provide your own count capability by implementing ExternalDataSource.SupportCount
and the following method:
count(ExternalQuery query)
This should return the number of results for the provided query. The query can be parsed the same way as the query
method.
In case of one or multiple providers, count()
returns always one row containing the number rows matching the query.
Enhancing and merging external content with Jahia content
Jahia allows you to extend your external data content with some of its own mixins or to override some properties of your nodes from Jahia. This allow in your definition to mix for example external data and data defined in Jahia.
In your Spring file you can declare two things, which of your nodes are extensible by additional mixin and properties, and which properties from your definition can be overridden/merge. Here how you do that:
<property name="extendableTypes"> <list> <value>nt:base</value> </list> </property>
This is saying that all your types are extendable, but you can limit that to only certain nodes by listing their definitions. Any mixin can be added on nodes that are extendable.
<property name="overridableItems"> <list> <value>jtestnt:directory.*</value> <value>jtestnt:airline.firstclass_seats</value> </list> </property>
This one is saying that all properties from jtestnt:directory can be overridden inside Jahia. The next one is saying that only the property firstclass_seats from airline definition can be overridden.
On regular usage those nodes will only be available to end users/editors if the external provider is mounted. If you unmount your external provider those data will only be accessible from Jahia tools for administrative purpose.
As all content coming from the external provider, these content are not subject to publication. Any extension will be visible in both default and live workspace immediately.
Writing and updating content
The external provider can be writeable, this means that you will be able to create new content, or update existing one from within Jahia.
This capability will require you to implement ExternalDataSource.Writable
interface which define 4 methods:
move
order
removeItemByPath
saveItem
Your provider should at least implement saveItem. saveItem will receive ExternalData
with all modified properties. Note that if you are using lazy properties, modified properties will be moved from the set of lazy properties to the map of properties. Removed properties will be removed from both properties map and lazy properties set.
If content can be deleted, then you should implement removeItemsByPath
.
The other two methods (move and order) are optional behavior, that need to be implemented only if your provider support them (for example the VFSDataSource
does not implement order as files are not ordered on a filesystem but moving is implemented).
Here is an example of how to access binary data from ExternalDataSource
and save them into the filesystem using VFS API
(example from the VFSDataSource
).
public void saveItem(ExternalData data) throws RepositoryException { try { ExtendedNodeType nodeType = NodeTypeRegistry.getInstance().getNodeType(data.getType()); if (nodeType.isNodeType(Constants.NT_RESOURCE)) { OutputStream outputStream = null; try { final Binary[] binaries = data.getBinaryProperties().get(Constants.JCR_DATA); if (binaries.length > 0) { outputStream = getFile(data.getPath().substring(0, data.getPath().indexOf("/" + Constants.JCR_CONTENT))).getContent().getOutputStream(); for (Binary binary : binaries) { InputStream stream = null; try { stream = binary.getStream(); IOUtils.copy(stream, outputStream); } finally { IOUtils.closeQuietly(stream); binary.dispose(); } } } } catch (IOException e) { throw new PathNotFoundException("I/O on file : " + data.getPath(),e); } catch (RepositoryException e) { throw new PathNotFoundException("unable to get outputStream of : " + data.getPath(),e); } finally { IOUtils.closeQuietly(outputStream); } } else if (nodeType.isNodeType("jnt:folder")) { try { getFile(data.getPath()).createFolder(); } catch (FileSystemException e) { throw new PathNotFoundException(e); } } } catch (NoSuchNodeTypeException e) { throw new PathNotFoundException(e); } }
Provider factories
It is possible to create a configurable external data source that will be mounted and unmounted on demand by the server administrator. Instead of declaring a mount point in the spring declaration, you can add a bean implementing the ProviderFactory
interface, which will be responsible of mounting the provider.
The factory need to be associated with a node type which inherits from jnt:mountPoint
, and that will define all required properties to correctly initialize the Data Source. Thenthe moutProvider method will instantiate the External Data Provider instance based on a prototype, and initialize the Data Source. Here’s the code the definition of a mount point from the VFS Provider:
[jnt:vfsMountPoint] > jnt:mountPoint - j:rootPath (string) nofulltext
And the associated code, which create the provider by mount the VFS url passed in j:rootPath
:
public JCRStoreProvider mountProvider(JCRNodeWrapper mountPoint) throws RepositoryException { ExternalContentStoreProvider provider = (ExternalContentStoreProvider) SpringContextSingleton.getBean("ExternalStoreProviderPrototype"); provider.setKey(mountPoint.getIdentifier()); provider.setMountPoint(mountPoint.getPath()); VFSDataSource dataSource = new VFSDataSource(); dataSource.setRoot(mountPoint.getProperty("j:rootPath").getString()); provider.setDataSource(dataSource); provider.setDynamicallyMounted(true); provider.setSessionFactory(JCRSessionFactory.getInstance()); try { provider.start(); } catch (JahiaInitializationException e) { throw new RepositoryException(e); } return provider; }
Once the provider factory is declared, the “w” button in document manager will display the new node type, allowing the administrator to create a new mount point with this Data Source.
External data ACL implementation
Since the revision 4.0 of the external provider, we introduced the support of the ACL for the External provider. It can be either provided by the external provider or let Jahia completely manage them.
Default behavior
By default, you can use the Jahia ACL directly on external nodes the same way as other nodes. The ACL will be stored as extensions.
ACL or privileges?
Some external source can provide a way to get all ACL for resource, but other can only provide the allowed operation, or privileges, on it. Depending of both you can either implement ACL or privileges support for the provider.
ACL from the provider
You can let the provider get the ACL from the external source.
In order to do so, the Datasource has to implement ExternalDataSource.AccessControllable and set an ExternalDataAcl to the DataSource.
ExternalDataAcl
ExternalDataAcl contains a list of roles granted or denied associated to a user or a group.
The provider can provide new roles with custom permissions, as we do not export the roles and have no mean to save any modification on a running server, these roles do not have to be edited in the role manager (in a further version, we will make the role editor read only for such kind of roles)
First create an ExternalDataAcl
new ExternalDataAcl()
Then fill it with access control entries :
ExternalDataAcl.addAce(type, principal, roles)
type is one of : ExternalDataAce.Type.GRANT or ExternalDataAce.Type.DENY
principal is a group or a user, format is : u:userKey or g:groupKey
roles is a list of roles names
Note that the ExternalDataSource.AccessControllable interface has been updated, the method String[] getPrivilegesNames(String username, String path) has been removed.
ExternalData
To support the ACL you have to set the ExternalDataAcl in the ExternalData using the method
ExternalData.setExternalDataAcl(ExternalDataAcl acl)
example :
// acl ExternalDataAcl userNodeAcl = new ExternalDataAcl(); userNodeAcl.addAce(ExternalDataAce.Type.GRANT, "u:" + user.getUsername(), Collections.singleton("owner")); userExtrernalData.setExternalDataAcl(userNodeAcl);
Note that ACL are read only on an external node when they are provided by the DataSource.
Privileges support
If your data source only provides allowed actions for a resource, you have to implement ExternalDataSource.SupportPrivileges on your Datasource. You will have to implement the method getPrivilegesNames that will return for a user and a path, the list of String as Jahia privilege names. A Jahia privileged name is the concatenation of a privilege from javax.jcr.security.Privilege and if necessary the workspace where it applies. They are structured like this:
privilegeName[_(live | default)]
Examples :
Privilege.JCR_READ Privilege.JCR_ADD_CHILD_NODES + "_default"
Note that the role tabs in the edit engine or managers are not accurate because they are displaying Jahia inherited roles. However these roles are meaningless for the external source as they are not used . Also, as for ACL implementation, the role tab is read only, so no operation can be done on it.
Disabling
By default, ACL on external content is enabled, to disable it completely you have to set in your instance of ExternalContentStoreProvider the property aclSupport to false.
Note that we remove some permission to have a consistent behavior in edit engine when acl or the content are not allowed to be edited.
Edit engine
If the external source is not writable/extendable and has no ACL support, the content will not be editable, the menu entry edit will not be available.
If the external source do not support ACL but can be overridden or is writable, the roles panel will be displayed in read only within the edit engine.
Warnings
- When the external data contains ACLs, you cannot update ACL on the corresponding node (the roles tab in edit engine is in read only)
- If a module defines roles, as they are imported each time the module is deployed, you cannot edit them from the settings panel as you will lose all your changes.
Comparison of data management between JCR, EDP and User Provider
This summary table provides for a list of key data types the differences between their management within the JCR, which is limitless, and the list of available actions and expected results when managed within an External Data or a User Provider connected to Jahia.
Within the JCR | What you can do with the External Data Provider | What you can do with the Users provider |
Identifier | Can provide its own UUID, or let the EDP generate one. | Users/groups are identified by their name only. A UUID is generated by EDP for every user/group/member node. |
Property types | All types/i18n/multiple supported | No i18n or binary properties. Multiple values supported. |
Reference properties | Internal or external references supported. | No reference properties for users/groups. Group members internally references users and groups from the same provider. |
Search - JCR-SQL2 queries | QOM model passed to EDP, up to implementation to parse and execute the query as it can ( ISO-37 ) QueryHelper is provided to help parsing of simple query, but do not support all type of constraints (1 type of boolean operators, only = comparison)
Queries results are aggregated sequentially for each provider, so global ordering may not be consistent ( https://jira.jahia.org/browse/IDEAS-802 ) . |
QOM is interpreted and query is transformed into a simple key-value pair criteria, on users and groups nodes only. Only simple AND or OR search can be done, no combination of both ( fix on and/or selection : QA-9046 ). Complex queries cannot be implemented in users provider. Cannot do query on member nodes. Ordering not supported ( MAE-40 ) |
Listeners and rules | Not supported yet - see https://jira.jahia.org/browse/BACKLOG-5678 | - |
Publication | Publication not supported : content is visible in default and live. This applies on external nodes, but also on extensions | Same as EDP - but it can be confusing that extensions nodes (content stored under users and groups) do not support publication. |
ACL and permissions | Can set ACLs on any node as an extension (stored in JR) if the provider does not give its own ACL. | User nodes have Extensions node (content added under the users and the groups) can have custom ACLs set by the user |
Write operations | Can define a writeable provider | Not supported |
Sending events to Jahia
Goals
In some cases, it can be useful to send the information to Jahia that an item, mounted in an external provider, has been modified externally. This will allows to execute listeners in Jahia that can trigger indexation and cache flush. This event will be sent by the external system to Jahia through a specific REST API.
Listening to events
Event listener won’t receive by default the events from the API - the listener must implement the ApiEventListener interface to get them as any other event.
The EventWrapper class provides a method “isApiEvent()” that can be used to check if the event is coming from the Api or not.
Sending events
The REST entry point in a url of the form:
http://<server>/<context>/modules/external-provider/events/<provider-key>
To find the <provider-key> you can go to Administration -> System components -> external provider, your mount point should be displayed in the table, and the first column contains the <provider-key> of your provider.
This URL accepts a JSON formatted input defining the list of events to trigger in Jahia.
Events format
The events are a JSON serialization of javax.jcr.observation.Event ( see https://docs.adobe.com/docs/en/spec/javax.jcr/javadocs/jcr-2.0/javax/jcr/observation/Event.html ), and so contain the following entries:
{ type: string path: string identifier: string userID: string info: object date : string }
The type of event is one of the value defined in javax.jcr.observation.Event:
- NODE_ADDED
- NODE_REMOVED
- PROPERTY_ADDED
- PROPERTY_REMOVED
- PROPERTY_CHANGED
- NODE_MOVED
If not specified, the event type is “NODE_ADDED” by default.
- Path is mandatory and should point to the node/property on which the event happen. Note that path is the local path in the external system.
- Identifier is not mandatory, it’s the id as known by the external system.
- UserID is the username of the user who originally triggered the event.
- Info contains optional data related to the event. For the “node moved” event, it contains the source and target of the move.
- Date is the timestamp of the event, in ISO9601 compatible format. If not specified, default value is the current time.
Example
curl --header "Content-Type: application/json" \ --request POST \ --data '[ { "path":"/deadpool.jpg", "userID":"root" }, { "type":"NODE_REMOVED", "path":"/oldDeadpool.jpg", "userID":"root" }]' \ http://localhost:8080/modules/external-provider/events/2dbc3549-15ff-4b08-92b9-94fc78beeba1
Passing external data
In addition, the “info” field can also contains an “externalData” object which contains a serialized version of the “ExternalData” object. This data will be loaded into the session, so that listeners can have access to the external node without requesting back the data to the provider. This avoid a complete round trip to the external data provider.
For example, sending events without data could give this sequence:
If the externalData is provided for both events, this would lead instead to this sequence:
External data format
External data can contain the following entries:
{ "id": string, "path":string, "type":string, "properties": object, "i18nProperties":object, "binaryProperties":object, "mixin": string[] }
The fields id, path and type are mandatory.
- properties is an object with the properties name as key and an array of serialized values as value (array of one value for non multi-valued properties)
- i18nProperties is an object with the language as key, and a structure like properties as value
- binaryProperties is an object with properties name as key and an array of base64 encoded value
Example
curl --header "Content-Type: application/json" \ --request POST \ --data '[ { "path":"/deathstroke.jpg", "userID":"root", "info": { "externalData":{ "id":"/deathstroke.jpg", "path":"/deathstroke.jpg", "type":"jnt:file", "properties": { "jcr:created": ["2017-10-10T10:50:43.000+02:00"], "jcr:lastModified": ["2017-10-10T10:50:43.000+02:00"] }, "i18nProperties":{ "en":{ "jcr:title":["my title"] }, "fr":{ "jcr:title":["my title"] } }, "mixin":["jmix:image"] } } }, { "path":"/deathstroke.jpg/jcr:content", "userID":"root", "info": { "externalData":{ "id":"/deathstroke.jpg/jcr:content", "path":"/deathstroke.jpg/jcr:content", "type":"jnt:resource", "binaryProperties": { "j:extractedText":["ZXh0cmFjdCBjb250ZW50IGJ1YmJsZWd1bQ=="] } } } } ]' \ http://localhost:8080/modules/external-provider/events/2dbc3549-15ff-4b08-92b9-94fc78beeba1
REST API Security
By default the REST API is not allowed, so any request will be denied. You must provide an api key.
An apiKey is not generated automatically, you have to do it manually and configure it inside a new config file available in: /digital-factory-data/karaf/etc
Named: org.jahia.modules.api.external_provider.event.cfg
Key declaration
<name>.event.api.key=<apiKey>
- <apiKey> : the apiKey
- <name> : the name you want, it's just used to group the other config options for the same key
Example:
global.event.api.key=42267ebc-f8d0-4f4d-ac98-21fb8eeda653
Restrict apiKey to some providers
By default an apiKey is used to protect all the providers, but you can restrict the providers allowed by an apiKey
<name>.event.api.providers=<providerKeys>
- <apiKey>: the apiKey
- <name>: the name used to declare the key
- <providerKeys>: coma separated list of providerKeys
Example:
providers.event.api.key=42267ebc-f8d0-4f4d-ac98-21fb8eeda653
providers.event.api.providers=provider1,provider2,provider3