Attributes of Policy manifest¶

Structure of a Access Policy manifest¶

name: ${my_policy}
version: v1
type: policy
tags:
  - ${policy}
  - ${access}
description: ${policy manifest}
owner: ${iamgroot}
layer: users
policy:
  access:
    subjects:
      tags:
        - - ${roles:id:**}
        - - ${users:id:**}
    predicates:
      - ${create}
      - ${read}
      - ${write}
      - ${put}
      - ${update}
      - ${delete}
      - ${post}
      - ${access}
    objects:
      paths:
        - ${dataos://icebase:retail/city}
    allow: ${true}
    collection: default
    name: ${test_access_policy}
    description: ${description of policy}

Configuration Attributes¶

`policy`¶

Description: configuration for the policy.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

policy:
  data/access:
    {}

`access`¶

Description: mapping for access policy attributes

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

access:
  subjects:
    {}
  predicates:
    {}
  objects:
    {}
  allow: true

`subjects`¶

Description: a subject is a user that would like to perform a specific predicate on a specific object. It refers to persons or application/services that make the request to perform an action. Attributes of the subject might include tags or groups of tags.

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

subjects:
  tags:
    -- "roles:id:testuser"

`objects`:¶

Description: the target that the subject would like to perform the predicate on. This can be any target, an API path, a column. The object is the resource (data or service) on which the action is to be performed.

Data Type	Requirement	Default Value	Possible Value
list of strings	mandatory	none	any target, api path, tags of requested object, column etc.

Additional Information: Predicates are ‘OR’ relationships only, since the PEP is authorizing one action at a time.

Example Usage:

objects:
  paths:
    - /dataos/system/themis/**

`paths`¶

Description: object address in the form of paths

Data Type	Requirement	Default Value	Possible Value
list of string	optional	none	valid paths

Example Usage:

paths:
  - /dataos/system/themis/**

`tags`¶

Data Type	Requirement	Default Value	Possible Value
list of strings	mandatory	none	none

Additional Information: The manifest tags field in both subjects and objects is an array of string arrays. The rules page will help you to define required expressions.

`predicates`¶

Description: the action or the verb that the subject would like to perform on the specific object.

Data Type	Requirement	Default Value	Possible Value
list of strings	mandatory	none	crud operations like read, write, update, delete or http operations like get, put, post, delete, options.

Additional Information: Predicates are ‘OR’ relationships only, since the PEP is authorizing one action at a time.

Example Usage: in this example policy, a predicate MUST be read OR write from the PEP to qualify for this policy to apply.

predicates:
  - read
  - write

Sample Predicates OR Relationship

name: predicate-example2
version: v1
type: policy
layer: user
description: example policy
policy:
  access:
    subjects:
      tags:
        - - roles:id:pii-reader
          - roles:id:user
        - - roles:id:marketing-manager
    predicates:
      - read
      - write
    objects:
      tags:
        - - PII.Sensitive
          - dataos:type:column
    allow: true

`allow`¶

Description: action to be allowed or denied

Data Type	Requirement	Default Value	Possible Value
boolean	optional	false	true/false

Example Usage:

allow: true

`name`¶

Description: name of the access policy

Data Type	Requirement	Default Value	Possible Value
string	optional	false	none

Example Usage:

name: my-first-access-policy

`description`¶

Description: description of the access policy

Data Type	Requirement	Default Value	Possible Value
string	optional	false	none

Example Usage:

description: this is a access policy

`collection`¶

Description: description of the access policy

Data Type	Requirement	Default Value	Possible Value
string	optional	default	none

Example Usage:

collection: default

Structure of a Data Policy manifest¶

name: ${my_policy}
version: v1
type: policy
tags:
  - policy
  - data
policy:
  data:
    type: ${filter/mask}
    depot: ${icebase}
    collection: ${data_uber}
    dataset: ${sample_driver}
    priority: ${90}
    selector:
      user:
        match: any
        tags:
          - "roles:id:testuser"
      column:
        tags:
          - "PII.email"
          - "PII.income"
    ${filter/mask}:

Configuration Attributes¶

`data`¶

Description: data policy specific section

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

data:
  {}

`priority`¶

Description: the Policy with lower value of priority attribute will take precedence over all other policies associated with the same resources. Consequently, a policy assigned a priority of 1 will supersede any conflicting policy assigned a priority of 90.

Data Type	Requirement	Default Value	Possible Value
number	optional	none	1 to 100 (inclusive)

Example Usage:

priority: 80

`depot`¶

Description: name of depot

Data Type	Requirement	Default Value	Possible Value
string	optional	none	**

Example Usage:

depot: icebase

`collection`¶

Description: name of the collection in the glob pattern

Data Type	Requirement	Default Value	Possible Value
string	optional	default	valid collection name , ** (for all collection)

Example Usage:

collection: retail

`dataset`¶

Description: name of dataset

Data Type	Requirement	Default Value	Possible Value
string	optional	none	**

Example Usage:

dataset: city

`selector`¶

Description: selector section

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

selector:
  user:
    {}
  column:
    {}

`user`¶

Description: section for defining the user

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

user:
  match: any
  tags:
    - "roles:id:testuser"

`match`¶

Description: The match attribute specifies the match condition.

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	any/all

Additional Information: - any- must match at least one tag - all - must match all tags

Example Usage:

match: any

`column`¶

Description: column section

Data Type	Requirement	Default Value	Possible Value
mapping	mandatory	none	none

Example Usage:

column:

`names`¶

Description: list of column names

Data Type	Requirement	Default Value	Possible Value
list of strings	mandatory	none	valid column name

Example Usage:

names:
  - "first_name"
  - "last_name"

`tags`¶

Description: list of tags given to columns

Data Type	Requirement	Default Value	Possible Value
list of strings	mandatory	none	valid column tags defined under some tag group

Example Usage:

tags:
  - "PII.age"
  - "dataos.city"

`mask`¶

Description: field for defining the data masking strategy

Data Type	Requirement	Default Value	Possible Value
mapping	optional	none	depends on the masking strategy utilized

Example Usage:

mask:
  operator: hash
  hash:
    algo: sha256

`operator`¶

Description: operator defines the data masking strategy

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	depends on the masking strategy utilized

Example Usage:

mask:
  operator: hash
  hash:
    algo: sha256

Additional Information:

Masking strategies are key components in preserving data privacy and ensuring information security. These strategies encompass a set of operators or rules meticulously designed with the capability to be tailored based on user requirements.

Here's a handy table that lists the data masking strategies and the corresponding data types they can be used for:

Masking Type	Operator	Text	Number	Date	Object
Hashing	`hash`	Y	N	N	N
Bucketing	`bucket_name`, `bucket_date`	N	Y	Y	N
Regex replace	`regex_replace`	Y	N	N	N
Format preservation (Random Pattern)	`rand_pattern`	Y	N	N	N
Redaction	`redact`	Y	Y	Y	N
Pass Through	`pass_through`	Y	Y	Y	Y

In the following section, we delve into comprehensive explanations and syntax examples for each of these data masking strategies.

`bucket_number`¶

Using the bucket_number operator, numerical data can be categorized into defined range 'buckets'. Each data point is then replaced by the lower boundary of the bucket it falls into.

Configuration SyntaxExample

To leverage the bucket_number operator, incorporate the following YAML configuration in your data masking definitio. The${bucket_list} is a placeholder for your list of bucket ranges.

mask:
  operator: bucket_number 
  bucket_number:
    buckets:
      ${bucket-list}

mask:
  operator: bucket_number 
  bucket_number:
    buckets:
      - 20
      - 40
      - 60
      - 80
      - 100

In this example, numerical data would be segmented into the indicated ranges. A value of 27, for example, would be bucketed to the 20 range, whereas a value of 77 would fall into the 60 range.

`bucket_date`¶

The bucket_date operator functions similarly to ‘bucket_number’ but is specifically tailored for date data types. This strategy enables the categorization of dates into various precision levels such as hour, day, week, or month.

Configuration SyntaxExample

To implement the ‘bucket_date’ operator, use the following YAML configuration:

mask:
  operator: bucket_date
  bucket_date:
    precision: ${date-precision} # Options: hour, day, week, month

mask:
  operator: bucket_date
  bucket_date:
    precision: month

In this example, the precision field designates the granularity of the date bucketing to be at a 'month' level.

`hash`¶

The hashing method is a powerful data masking technique wherein a specific input consistently produces an identical fixed-size byte string, commonly referred to as a 'hash code'. A notable feature of hashing is its sensitivity to changes in input; even the slightest modification in input can yield a significantly different hash output.

A unique and crucial characteristic of hashing is its irreversibility — once data is hashed, it cannot be converted back to its original state. This property makes hashing a particularly useful technique for masking sensitive textual data, such as passwords, or personally identifiable information (PII), such as names and email addresses.

Hashing involves the use of a specific algorithm that performs the conversion from original data to hashed data. To implement hashing, you will need to specify the hashing algorithm you wish to use. The general syntax structure is as follows:

Configuration SyntaxExample

mask:
  operator: hash
  hash:
    algo: ${algorithm-name}

mask:
  operator: hash
  hash:
    algo: sha256

In this YAML configuration, the operator hash is specified along with the SHA-256 algorithm (algo: sha256). The SHA-256 algorithm is a popular choice due to its strong security properties, but other algorithms could be used as per your requirements.

Remember, the hash operator is only applicable to textual data types. Attempting to use it on non-textual data types may lead to unintended results or errors. Always make sure the data you wish to mask is compatible with the masking operator you choose.

`redact`¶

Redaction is a data masking strategy that aims to obscure or completely erase portions of data. Its real-world analogy can be seen in blacking out sections of a document to prevent sensitive information from being disclosed. When applied to data masking, redaction might involve replacing certain elements in a data field (such as characters in an email address or digits in a Social Security number) with a placeholder string, e.g., "[REDACTED]"

For instance, the gender of every individual could be redacted and substituted with a consistent value, 'REDACTED'. Similarly, an individual's location information (which may include address, zip code, state, or country) could be redacted and replaced with 'REDACTED'.

Configuration SyntaxExample

To implement the redact operator, the following YAML configuration can be utilized:

mask:
  operator: redact
#   redact:
#     replacement: ${replacement-string}
#    hash: 
#   algo: sha256

mask:
  operator: redact
  redact:
    replacement: 'REDACTED'

The replacement field determines the string that will replace the redacted portions of data.

`rand_pattern`¶

Random Pattern Masking involves the substitution of sensitive data with randomly produced equivalents that maintain the original data's format or structure. The fundamental goal is to ensure that the masked data is statistically representative and retains operational utility while safeguarding critical information. For example, it can be used to replace personal names with random strings or transform real addresses into plausible but entirely fictitious ones.

Configuration SyntaxExample

To implement the rand_pattern operator, the following YAML configuration can be utilized:

mask:
  operator: rand_pattern
  rand_pattern:
    pattern: ${random-pattern}

Below is an example illustrating the usage of the rand_pattern operator:

mask:
  operator: rand_pattern
  rand_pattern:
    pattern: '####-####-####'

In this instance, the specified pattern '####-####-####' will generate random numbers in a format similar to a credit card number, preserving the structure of the original data but replacing it with randomly generated information.

Format Preserving Encryption (FPE): As the name suggests, this method encrypts data in a way that the output has the same format as the input. For example, if a 16-digit credit card number is encrypted using FPE, the result is another 16-digit number. This maintains functional realism, allowing systems to operate normally with masked data.

`regex_replace`¶

The Regular Expression (Regex) Replacement strategy utilizes regular expressions to discern and mask identifiable patterns within the data. The identified patterns can be substituted with a predetermined value or random character(s). This strategy is particularly advantageous for masking data that follows a predictable pattern, such as email addresses, phone numbers, or credit card information.

The regex_replace operator requires a pattern and a replacement field in its configuration. The general syntax structure is as follows. The pattern field expects a regular expression pattern as its value, while the replacement field expects the desired replacement string.:

Configuration SyntaxExample

mask:
  operator: regex_replace
  regex_replace:
    pattern: ${regex-pattern}
    replacement: ${replacement-pattern}

mask:
  operator: regex_replace
  regex_replace:
    pattern: .{5}$
    replacement: xxxxx

In the above example, the regular expression.{5}$ represents any five characters at the end of a string. These characters will be replaced by 'xxxxx'. Here, the regular expression [0-9] denotes any single digit which will be replaced with '#'.

Configuration SyntaxExample

mask:
  operator: regex_replace
  regex_replace:
    pattern: '[0-9]'
    replacement: '#'

mask:
  operator: regex_replace
  regex_replace:
    pattern: '[0-9](?=.*.{4})'
    replacement: '#'

In this case, the regex pattern [0-9](?=.*.{4}) identifies a digit followed by at least four characters. This digit will be replaced by '#'.

`pass_through`¶

The "Pass Through" strategy is used when certain data elements should not be masked or altered. With this technique, data developers can specify that certain data fields remain unchanged during the masking process. This approach is suitable for data that doesn't contain sensitive information or data that is already anonymized.

To implement the pass_through operator, the following YAML configuration can be utilized:

mask:
  operator: pass_through

`filters`¶

The data visibility for end users is limited due to the filtering policy. You can build a policy to eliminate results from a query's result set depending on comparison operators specified on a column, for example, some users won't be able to see data from the 'Florida' area.

Data Type	Requirement	Default Value	Possible Value
list of mappings	optional	none	depends on the filter pattern utilized

Example Usage:

filters: 
  - column: ${store_state_code}
    operator: ${not_equals}
    value: ${TN}

`column`¶

Description: column name

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	valid column name

Example Usage:

column: store_state_code

`operator`¶

Description: filter operator name

Data Type	Requirement	Default Value	Possible Value
string	mandatory	none	equals/not_equals

Example Usage:

operator: not_equals

`value`¶

Description: value on which filter is to be applied

Data Type	Requirement	Default Value	Possible Value
depends on `column` data type	mandatory	none	any value within the column

Example Usage:

value: TX

Attributes of Policy manifest¶

Structure of a Access Policy manifest¶

Configuration Attributes¶

policy¶

access¶

subjects¶

objects:¶

paths¶

tags¶

predicates¶

allow¶

name¶

description¶

collection¶

Structure of a Data Policy manifest¶

Configuration Attributes¶

data¶

priority¶

depot¶

collection¶

dataset¶

selector¶

user¶

match¶

column¶

names¶

tags¶

mask¶

operator¶

bucket_number¶

bucket_date¶

hash¶

redact¶

rand_pattern¶

regex_replace¶

pass_through¶

filters¶

column¶

operator¶

value¶

`policy`¶

`access`¶

`subjects`¶

`objects`:¶

`paths`¶

`tags`¶

`predicates`¶

`allow`¶

`name`¶

`description`¶

`collection`¶

`data`¶

`priority`¶

`depot`¶

`collection`¶

`dataset`¶

`selector`¶

`user`¶

`match`¶

`column`¶

`names`¶

`tags`¶

`mask`¶

`operator`¶

`bucket_number`¶

`bucket_date`¶

`hash`¶

`redact`¶

`rand_pattern`¶

`regex_replace`¶

`pass_through`¶

`filters`¶

`column`¶

`operator`¶

`value`¶