Attributes of Policy manifest¶
Structure of a Access Policy manifest¶
name: ${my_policy}
version: v1
type: policy
tags:
- ${policy}
- ${access}
description: ${policy manifest}
owner: ${iamgroot}
layer: users
policy:
access:
subjects:
tags:
- - ${roles:id:**}
- - ${users:id:**}
predicates:
- ${create}
- ${read}
- ${write}
- ${put}
- ${update}
- ${delete}
- ${post}
- ${access}
objects:
paths:
- ${dataos://icebase:retail/city}
allow: ${true}
collection: default
name: ${test_access_policy}
description: ${description of policy}
Configuration Attributes¶
policy
¶
Description: configuration for the policy.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
access
¶
Description: mapping for access policy attributes
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
subjects
¶
Description: a subject is a user that would like to perform a specific predicate on a specific object. It refers to persons or application/services that make the request to perform an action. Attributes of the subject might include tags or groups of tags.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
objects
:¶
Description: the target that the subject would like to perform the predicate on. This can be any target, an API path, a column. The object is the resource (data or service) on which the action is to be performed.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | mandatory | none | any target, api path, tags of requested object, column etc. |
Additional Information: Predicates are ‘OR’ relationships only, since the PEP is authorizing one action at a time.
Example Usage:
paths
¶
Description: object address in the form of paths
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of string | optional | none | valid paths |
Example Usage:
tags
¶
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | mandatory | none | none |
Additional Information: The manifest tags field in both subjects and objects is an array of string arrays. The rules page will help you to define required expressions.
predicates
¶
Description: the action or the verb that the subject would like to perform on the specific object.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | mandatory | none | crud operations like read, write, update, delete or http operations like get, put, post, delete, options. |
Additional Information: Predicates are ‘OR’ relationships only, since the PEP is authorizing one action at a time.
Example Usage: in this example policy, a predicate MUST be read
OR write
from the PEP to qualify for this policy to apply.
Sample Predicates OR Relationship
allow
¶
Description: action to be allowed or denied
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
boolean | optional | false | true/false |
Example Usage:
name
¶
Description: name of the access policy
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | false | none |
Example Usage:
description
¶
Description: description of the access policy
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | false | none |
Example Usage:
collection
¶
Description: description of the access policy
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | default | none |
Example Usage:
Structure of a Data Policy manifest¶
name: ${my_policy}
version: v1
type: policy
tags:
- policy
- data
policy:
data:
type: ${filter/mask}
depot: ${icebase}
collection: ${data_uber}
dataset: ${sample_driver}
priority: ${90}
selector:
user:
match: any
tags:
- "roles:id:testuser"
column:
tags:
- "PII.email"
- "PII.income"
${filter/mask}:
Configuration Attributes¶
data
¶
Description: data policy specific section
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
priority
¶
Description: the Policy with lower value of priority attribute will take precedence over all other policies associated with the same resources. Consequently, a policy assigned a priority of 1 will supersede any conflicting policy assigned a priority of 90.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
number | optional | none | 1 to 100 (inclusive) |
Example Usage:
depot
¶
Description: name of depot
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | ** |
Example Usage:
collection
¶
Description: name of the collection in the glob pattern
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | default | valid collection name , ** (for all collection) |
Example Usage:
dataset
¶
Description: name of dataset
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | optional | none | ** |
Example Usage:
selector
¶
Description: selector section
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
user
¶
Description: section for defining the user
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
match
¶
Description: The match
attribute specifies the match condition.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | any/all |
Additional Information: - any
- must match at least one tag - all
- must match all tags
Example Usage:
column
¶
Description: column section
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | mandatory | none | none |
Example Usage:
names
¶
Description: list of column names
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | mandatory | none | valid column name |
Example Usage:
tags
¶
Description: list of tags given to columns
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of strings | mandatory | none | valid column tags defined under some tag group |
Example Usage:
mask
¶
Description: field for defining the data masking strategy
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
mapping | optional | none | depends on the masking strategy utilized |
Example Usage:
operator
¶
Description: operator defines the data masking strategy
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | depends on the masking strategy utilized |
Example Usage:
Additional Information:
Masking strategies are key components in preserving data privacy and ensuring information security. These strategies encompass a set of operators or rules meticulously designed with the capability to be tailored based on user requirements.
Here's a handy table that lists the data masking strategies and the corresponding data types they can be used for:
Masking Type | Operator | Text | Number | Date | Object |
---|---|---|---|---|---|
Hashing | hash |
Y | N | N | N |
Bucketing | bucket_name , bucket_date |
N | Y | Y | N |
Regex replace | regex_replace |
Y | N | N | N |
Format preservation (Random Pattern) | rand_pattern |
Y | N | N | N |
Redaction | redact |
Y | Y | Y | N |
Pass Through | pass_through |
Y | Y | Y | Y |
In the following section, we delve into comprehensive explanations and syntax examples for each of these data masking strategies.
bucket_number
¶
Using the bucket_number
operator, numerical data can be categorized into defined range 'buckets'. Each data point is then replaced by the lower boundary of the bucket it falls into.
To leverage the bucket_number
operator, incorporate the following YAML configuration in your data masking definitio. The${bucket_list}
is a placeholder for your list of bucket ranges.
In this example, numerical data would be segmented into the indicated ranges. A value of 27, for example, would be bucketed to the 20 range, whereas a value of 77 would fall into the 60 range.
bucket_date
¶
The bucket_date
operator functions similarly to ‘bucket_number’ but is specifically tailored for date data types. This strategy enables the categorization of dates into various precision levels such as hour, day, week, or month.
In this example, the precision
field designates the granularity of the date bucketing to be at a 'month' level.
hash
¶
The hashing method is a powerful data masking technique wherein a specific input consistently produces an identical fixed-size byte string, commonly referred to as a 'hash code'. A notable feature of hashing is its sensitivity to changes in input; even the slightest modification in input can yield a significantly different hash output.
A unique and crucial characteristic of hashing is its irreversibility — once data is hashed, it cannot be converted back to its original state. This property makes hashing a particularly useful technique for masking sensitive textual data, such as passwords, or personally identifiable information (PII), such as names and email addresses.
Hashing involves the use of a specific algorithm that performs the conversion from original data to hashed data. To implement hashing, you will need to specify the hashing algorithm you wish to use. The general syntax structure is as follows:
In this YAML configuration, the operator hash
is specified along with the SHA-256 algorithm (algo: sha256
). The SHA-256 algorithm is a popular choice due to its strong security properties, but other algorithms could be used as per your requirements.
Remember, the hash
operator is only applicable to textual data types. Attempting to use it on non-textual data types may lead to unintended results or errors. Always make sure the data you wish to mask is compatible with the masking operator you choose.
redact
¶
Redaction is a data masking strategy that aims to obscure or completely erase portions of data. Its real-world analogy can be seen in blacking out sections of a document to prevent sensitive information from being disclosed. When applied to data masking, redaction might involve replacing certain elements in a data field (such as characters in an email address or digits in a Social Security number) with a placeholder string, e.g., "[REDACTED]"
For instance, the gender of every individual could be redacted and substituted with a consistent value, 'REDACTED'. Similarly, an individual's location information (which may include address, zip code, state, or country) could be redacted and replaced with 'REDACTED'.
The replacement
field determines the string that will replace the redacted portions of data.
rand_pattern
¶
Random Pattern Masking involves the substitution of sensitive data with randomly produced equivalents that maintain the original data's format or structure. The fundamental goal is to ensure that the masked data is statistically representative and retains operational utility while safeguarding critical information. For example, it can be used to replace personal names with random strings or transform real addresses into plausible but entirely fictitious ones.
To implement the rand_pattern
operator, the following YAML configuration can be utilized:
In this instance, the specified pattern '####-####-####' will generate random numbers in a format similar to a credit card number, preserving the structure of the original data but replacing it with randomly generated information.
Format Preserving Encryption (FPE): As the name suggests, this method encrypts data in a way that the output has the same format as the input. For example, if a 16-digit credit card number is encrypted using FPE, the result is another 16-digit number. This maintains functional realism, allowing systems to operate normally with masked data.
regex_replace
¶
The Regular Expression (Regex) Replacement strategy utilizes regular expressions to discern and mask identifiable patterns within the data. The identified patterns can be substituted with a predetermined value or random character(s). This strategy is particularly advantageous for masking data that follows a predictable pattern, such as email addresses, phone numbers, or credit card information.
The regex_replace
operator requires a pattern
and a replacement
field in its configuration. The general syntax structure is as follows. The pattern
field expects a regular expression pattern as its value, while the replacement
field expects the desired replacement string.:
In the above example, the regular expression.{5}$
represents any five characters at the end of a string. These characters will be replaced by 'xxxxx'. Here, the regular expression [0-9]
denotes any single digit which will be replaced with '#'.
In this case, the regex pattern [0-9](?=.*.{4})
identifies a digit followed by at least four characters. This digit will be replaced by '#'.
pass_through
¶
The "Pass Through" strategy is used when certain data elements should not be masked or altered. With this technique, data developers can specify that certain data fields remain unchanged during the masking process. This approach is suitable for data that doesn't contain sensitive information or data that is already anonymized.
To implement the pass_through
operator, the following YAML configuration can be utilized:
filters
¶
The data visibility for end users is limited due to the filtering policy. You can build a policy to eliminate results from a query's result set depending on comparison operators specified on a column, for example, some users won't be able to see data from the 'Florida' area.
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
list of mappings | optional | none | depends on the filter pattern utilized |
Example Usage:
column
¶
Description: column name
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | valid column name |
Example Usage:
operator
¶
Description: filter operator name
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
string | mandatory | none | equals/not_equals |
Example Usage:
value
¶
Description: value on which filter is to be applied
Data Type | Requirement | Default Value | Possible Value |
---|---|---|---|
depends on column data type |
mandatory | none | any value within the column |
Example Usage: