ACL implementation in Jena Fuseki
Initial permissions mechanism
A class called ShiroEvaluator
contains the logic of the permissions mechanism in Jena/Fuseki/Shiro
It is an implementation of Jena permissions mechanism
Basically the HTTP SPARQL requests made on Jena endpoint should be Authenticated with the user admin, and a special header X-SemappsUser
should contain a string representing the URI of the currently logged in user, or the value system
if the query should bypass any ACL (the absence of such header is also considered as a system
access level).
The shiro.ini
file has been slightly modified in order to force Basic Authentication on all endpoints (it was previsouly too permissive)
The Jena storage backend is now TDB2 and the Jena version is 3.17 (latest as of february 2021).
The whole localData
dataset is protected by the Jena permission mechanism that uses the class ShiroEvaluator
to check ACLs.
See the configuration file in configuration/localData.ttl
for more details.
To compile this class, please refer to this README
Dataset, Model and Graph
A short explanation on how the Dataset is configured and why.
See the file src/jena/fuseki-docker/configuration/localData.ttl
to follow the names given here.
We will start from the top of the application stack: what the fuseki service exposes to the endpoints and web interface. In the localData file, we start to read from the end of the file and we go up.
- the fuseki service exposes a dataset called localData (and also another similar one called testData).
- this
localData
dataset is calledsecuredDataset
in the config file. it is composed of 2 graphs :- the
defaultGraph
which is accessible if noGRAPH
keyword is used in the SPARQL. - a named graph called
<http://semapps.org/webacl>
which will contain all the webACL tuples.
- the
- the fact that those 2 graphs are joined in the same
localData
dataset enables us to query on both graph in one request. This is specially useful when we will have to retrieve the list of users that belong to a group, if the group is outside of the scope of the webacl model (if it belongs to the application model, in the defaultGraph).
Exemple of queries to access both graphs. doc about Datasets in Jena here
- a query of the form
SELECT * { ?s ?p ?o }
will return only the results of the defaultGraph, without the ACL tuples, or any other named graph. - a query of the form
SELECT * FROM <http://semapps.org/webacl> { ?s ?p ?o }
will return all the ACL tuples, and only those. The ACL graph is behaving like it is the default graph in this case. - a query of the form
SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
will return the union of the 2 graphs. we can see thatg
gets a value only for the ACL tuples, as the defaultGraph has no graph name. - a query of the form
SELECT * { GRAPH ?g { ?s ?p ?o } }
will return only the ACL tuples because the default graph tuples do not have a graph name. - similarly
SELECT * { GRAPH <http://semapps.org/webacl> { ?s ?p ?o } }
will only return the ACL tuples. SELECT * FROM NAMED <http://semapps.org/webacl> { ?s ?p ?o } }
will return nothing because it says we should use the named graphwebacl
WITHOUT importing it inside the default graph. The query does not mention that we should find tuples in named graphs, but instead, in the default graph. therefor the result is empty.- the correct way to use FROM NAMED is by specifying we want to search the named graphs :
SELECT * FROM NAMED <http://semapps.org/webacl> { GRAPH ?g { ?s ?p ?o } }
We continue the explanation of the configuration file :
- each graph in our fuseki
localData
dataset is using a secured model to access the data. the defaultGraph usessecuredModel
and the ACL graph usessecuredACLModel
. - a model is just a wrapper on a graph, it adds few methods for list manipulations, essentially. So here we have a fuseki
Dataset
calledlocalData
that unites 2 logical graphs (one being the defaultGraph and the other being calledhttp://semapps.org/webacl
). Each logical graph is managed under the hood by a secured model. Each secured model is based on a plain TDB graph. Finally a TDB graph needs a storage definition, where the tuples eventually go into the harddisk. The storage is called DatasetTDB2 (not to be confused with the high level Dataset we talked about earlier). - The secured models get a name of their own, but this name is just used for internal things (and appears in the Java code of the evaluator), but you cannot query those graph names. the names of the 2 secured models are
http://semapps.org/securedModel
andhttp://semapps.org/securedWebacl
. Each secured model is configured with a security evaluator which is the configuration for the instanciation of ourShiroEvaluator
class. You can see that it gets 3 arguments when it is initialized. Those arguments are pointers to 2 graphs and one dataset used by the evaluator to make queries. The 2 graphs are the "plain TDB graph", unprotected, that contain the tuples of the defaultGraph and of the ACL graph (the defaultGraph will probably not be used from the evaluator, but i pass it nevertheless). The dataset that is passed as 3rd argument is a special dataset that we build here only for the sake of passing it to the evaluator. It is an unprotected dataset that is the UNION of the 2 graphs (defaultGraph and ACL graph) so we can query the ACL tuples, together with the group tuples that are stored in the defaultGraph. This unprotected dataset is not exposed to the fuseki endpoints and cannot be queried from anywhere else. - finally we can see that the 2 plain TDB graphs that hold our tuples are stored in 2 separate locations on the disk. the tuples of the WebACL go to the database called
aclData
while the tuples of the defaultGraph go to the databaselocalData
. Initially I tried to have both graphs stored in the same database. When we remove the security layers (the secured models and their evaluators), Jena accepts to store the 2 graphs in the same database. But there is a bug related to the transaction that is not propagated down the chain of wrapping of models and graphs, and it breaks Jena. this bug report sees to be of interest but it concerns the use of a reasoner, not of a security evaluator. The bug has been fixed 2 years ago, but I suspect the wrapping of models and graphs in the SecuredAssembler is not done well, and the transaction is not forwarded down the line. Here is a SO post that confirms that several named graphs and a defaultGraph could all be stored on the same database/DatasetTDB. But in our case, there is this problem ofTransactionException: Not in a transaction
! So for the moment, the 2 graphs are stored in 2 separate databases. - please note that if you use additional named graphs in your system, they will be automatically created by Jena on the fly, and they will therefor be "in memory" graphs, hence, not persisted. Furthermore, they will not have any security or ACL enforcement mechanism. If you need to have some business related named graphs, you will have to add them in the localData.ttl configuration file, pretty much like the ACL graph has been added, and for the reason explained above, they will be stored in a separate database each. That does not prevent them from being accessed under the same
localData
endpoint !
If the end-user adds some ACL tuples in the defaultGraph (via the sparql endpoint offered by the middleware, or as admin in the web interface of jena) then those ACL tuples will just be useless. We never use the defaultGraph to read out ACLs. This way, there is no way to inject malicious ACL tuples in our permission system.
Web interface for SPARQL queries
Fuseki offers a web interface to query your dataset here
This interface is protected by the admin
user and password. If you expose this web interface to a public IP, the password should be stong and the http connection should be secured with a TLS certificate.
The localData
dataset is accessible there as if are a system
user, and all the tuples can be queried and modified, both on the defaultGraph and on the ACL named graph http://semapps.org/webacl
.
Web ACL
The java class ShiroEvaluator
is checking, for every SPARQL request, the subject of each tuple that the SPARQL engine has asked to receive from the underlying storage.
It also prevents a SemAppsUser from accessing the ACL graph.
Implementation of the Web ACL specs in Java.
We chose to implement the algorithm with a succession of small steps and SPARQL queries. Indeed, at any moment, the algorithm can stop if it finds a matching permission. There is no need then to continue processing all the other cases, and we save some processing time. Implementing a big and unique SPARQL query that would contain all the cases would be unrealistic, and not efficient.
Having a fine-grained set of small SPARQL queries is more efficient also because we can chose to which graph each part should query. Most of the queries are done on the webACL graph. We use the defaultGraph only in 2 occasions: to fetch the group members and the containers.
The algorithm proceeds like follow:
- checks permissions on the Ressource itself
- checks permissions on the containers of that Ressource
- checks permissions on all the parent containers in a recursive way.
For each step, we have 2 cases:
- the user is anonymous, in which case we need to check for a
acl:AgentClass
with valuefoaf:Agent
- the user is a well known WEBID, in this case we need to check :
acl:AgentClass
with valuefoaf:Agent
(public access)acl:AgentClass
with valueacl:AuthenticatedAgent
(all knwon registered users access)acl:Agent
with the user's webIDacl:agentGroup
with each of the groups the user belongs to (we retrieve the list separately, so we can reuse it later on, for the containers)
For the access Modes, here is the corresponding table between JENA access types and WebACL. mode2
is an additional, complementary, optional mode that needs to be checked, in some cases.
JENA | WebACL mode1 | WebACL mode2
--- | --- | ---
Create | Append | Write
Read | Read | Write
Update | Write |
Delete | Write |
The evaluator stores in a cache the access control result that it finds for each ressource. The cache is used for subsequent evaluate calls within the same transaction/SPARQL request. Then the cache is emptied before the next query.
Blank nodes that are not part of a resource, meaning, orphan blanks nodes that linger at the root of the graph, can be created with Sparql, but cannot be retrieved afterwards. This is because the security mechanism cannot find the resource they belong to, and therefor, it denies access to it.
Tests
Some unit tests have been setup up for the ACL mechanism.
In order for the test to work properly, you need to first load 2 files into Jena testData
dataset:
- file
src/middleware/tests/fusekiAcl/testData.ttl
- file
src/middleware/tests/fusekiAcl/ACL_test_data.ttl
use this command in order to load the testData file:
$FUSEKI_HOME/tdbloader --loc=$FUSEKI_BASE/databases/testData testData.ttl
For the file ACL_test_data.ttl
the easiest is to go to the web interface of fuseki, enter your the username admin and password, got to the dataset testData
and enter the endpoint /testData/update
. Then copy paste the content of the ACL_test_data.ttl into the textarea for the query, and press the play button.
To run the tests, go to src/middleware/tests/
.
launch the command
npm test -- --testPathPattern=src/middleware/tests/fusekiAcl
During the test, if you encounter problems with the running port of Fuseki or username and password, please change values in src/middleware/tests/.env
TODO: test membership of groups that are in the defaultGraph (with inference of rdfs:subPropertyOf vcard:hasMember
).
Code guidelines
When coding in moleculer, it is important to always respect those rules:
- when calling the action directly from the same service
this.actions.nameOfAction(params)
it is important to add a second argument to pass the contextnameOfAction( params, { defaultCtx: ctx} );
- when inside an action and calling another action (to another service), always use the form
ctx.call()
and not the formthis.broker.call()
as the later will lose the context. - when you need to make a
system
call to the triplestore, you have to explicitly state it in the call, by adding an option in the 3rd arguments (2nd if using this.actions)){ meta: { webId:'system' } }
like this:ctx.call('action.name',{ param }, { meta: { webId:'system' } })
- when programming an action in moleculer, if you want to offer the user a parameter called
webId
in yourctx.params
, then becareful of 2 points:- you don't need to set this param
webId: ctx.meta.webId
when you call your action from the "API action". Indeed, the webId will come automatically from the context meta. - in the moleculer action, you have to check if you received a webId from params, otherwise, use ctx.meta.webId. Something like
const webId = ctx.params.webId || ctx.meta.webId
and most importantly, in all your subsequent ctx.calls inside the action code, always pass this webId explicitly ! Or to other actions that take a webId in their params, or to actions that don't take a webId in their params and in this case by using the meta in 3rd argument :ctx.call('an.action',{...myparams},{ meta: { webId} });
.
- you don't need to set this param
Example of usage
The user https://data.virtual-assembly.org/users/sebastien.rosset
is member of the group http://localhost:3000/_groups/group4
.
The resource http://localhost:3000/organizations/cheznous
is located, among others, in the container http://localhost:3000/container28/
which itself is inside the parent container http://localhost:3000/container29/
.
Because the group4
has been granted Read
permission to the container29
, and because the user sebastien.rosset
has been granted an individual Write
permission on the resource organizations/cheznous
, the following reply is given to this API call for this resource :
In Turtle:
@prefix acl: <http://www.w3.org/ns/auth/acl#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix : <http://localhost:3000/_acl/organizations/cheznous#>.
:Write a acl:Authorization;
acl:accessTo <http://localhost:3000/organizations/cheznous>;
acl:mode acl:Write;
acl:agent <https://data.virtual-assembly.org/users/sebastien.rosset>.
<http://localhost:3000/_acl/container29#DefaultRead> a acl:Authorization;
acl:mode acl:Read;
acl:default <http://localhost:3000/container29>;
acl:agentGroup <http://localhost:3000/_groups/group4>.
In JSON-LD:
{
"@context": {
"acl": "http://www.w3.org/ns/auth/acl#",
"foaf": "http://xmlns.com/foaf/0.1/",
"@base": "http://localhost:3000/_acl/organizations/cheznous"
},
"@graph": [
{
"@id": "#Write",
"@type": "acl:Authorization",
"acl:accessTo": "http://localhost:3000/organizations/cheznous",
"acl:agent": "https://data.virtual-assembly.org/users/sebastien.rosset",
"acl:mode": "acl:Write"
},
{
"@id": "http://localhost:3000/_acl/container29/#DefaultRead",
"@type": "acl:Authorization",
"acl:agentGroup": "http://localhost:3000/_groups/group4",
"acl:default": "http://localhost:3000/container29",
"acl:mode": "acl:Read"
}
]
}
Furthermore, it happens that the same user has Control
permission on the container29
, if we ask for the permissions on that container, this is what we will get:
@prefix acl: <http://www.w3.org/ns/auth/acl#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix : <http://localhost:3000/_acl/container29#>.
:Control a acl:Authorization;
acl:accessTo <http://localhost:3000/container29>;
acl:mode acl:Control;
acl:agent <https://data.virtual-assembly.org/users/sebastien.rosset>.
:DefaultRead a acl:Authorization;
acl:mode acl:Read;
acl:agentGroup <http://localhost:3000/_groups/group4>;
acl:default <http://localhost:3000/container29>.
Security
In general it is possible to obtain the list of all resource inside a container, if the user has Read access to the container, and even if they have no accesss to the resources themselves.
The LDP API will not show those resources though, because of the way it is implemented internally. But a SPARQL query on the apraql public endpoint will return them. It will return only the URIs of those resources. But this could still be considered a leak of some information.
The only way to eal with this problem is to uncomment line 602 of shiroEvaluator so it will check also the permissions of the Object of every triple, which we do not do for now, for performances reasons, and for compliance with the LDP protocol which deal with resources as a unit of data and ACL.
Future
If one day you program an action to delete a user profile, after deleting the user resource, please also call the
removeAgentGroupOrAgentFromAuthorizations
method, with a isGroup=false parameter.When creation of arbitrary containers at the root will be possible, please prevent the user from chosing those slugs, that must be reserved for system paths :
/_acl
/_groups
/_rights
For perfs improvement, switch the code of ShiroEvaluator to use the Java API for querying the model, instead of SPARQL queries.
root container https://github.com/assemblee-virtuelle/semapps/issues/429
PATCH of a resource : do the DELETE and INSERT in one call/transaction.
default perms for containers as a parameter in the code?
create a system named graph to store semapps config as triples. should be protected as webacl graph (no access except by system)
inference and groups defined in the business data model : https://github.com/assemblee-virtuelle/semapps/issues/590 in this case, how to call
removeAgentGroupOrAgentFromAuthorizations
when a group is deleted ? listen to some message ? need to configure which resource type should be listened to (Organization, Role...).