TextCollectionAttachment

An array of TextCollectionAttachment objects to be labeled.

Video Support

The video attachment should have content that is a link. Supported media types are listed on the MDN Web Docs.

HTML Support in TextCollection Attachments:

When creating a task in TextCollection, customers are able to pass Markdown as the string content. Markdown also allows the use of HTML tags within the Markdown syntax.

However, to ensure the security of the TextCollection platform, we sanitize all HTML tags passed within the Markdown syntax using the HTML-sanitize JavaScript package. This package removes all tags except for the specific set of allowed HTML tags mentioned on the table to the right.

By allowing only these specific HTML tags to be passed through the string, we ensure that the content displayed to the tasker is secure and adheres to our standards. Any HTML tags that are not included in the list of allowed tags will be removed from the string during the sanitization process.

By sanitizing the HTML tags, we prevent any potential security risks that could arise from the use of unauthorized HTML tags, and maintain a high level of security on our platform.

Parameter

Type

Description

type*

string

One of pdf, image, text, video, website, or audio.

content*

string

Content or link to relevant file.

forms

array

Array of field_id strings from FormField. If this value is set, only show the corresponding attachment if one of the referenced form fields is active.

HTML tags allowed:

Content sectioning

'address', 'article', 'aside', 'footer', 'header','h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hgroup', 'main', 'nav', 'section'.

Text content

'blockquote', 'dd', 'div', 'dl', 'dt', 'figcaption', 'figure', 'hr', 'li', 'main', 'ol', 'p', 'pre', 'ul',

Inline text semantics

'a', 'abbr', 'b', 'bdi', 'bdo', 'br', 'cite', 'code', 'data', 'dfn', 'em', 'i', 'kbd', 'mark', 'q', 'rb', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'time', 'u', 'var'

Table content

'caption', 'col', 'colgroup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr'

Additional Tags

'img', 'iframe'

UnitField

UnitField objects define simple components for data collection.

Beta: Conditional Fields

Sometimes a field should only be presented if specific choices are selected for other fields. In these cases, you can specify the conditions — the dependent questions and corresponding sets of choices.

The conditions property should have the following structure: an array of objects, which define one set of conditions allowing the field to be shown. The operators AND ({ }), OR ([ ]), and NOT (not) are supported, so you could specify an arbitrary set of fields and choices. Each set may contain objects or arrays with the following:

  • Key: the field_id of the dependent field

  • Value: an object specifying the desired choices for the dependent field.

For example conditions, please check out the code on the right.

Conditions currently only work with dependent fields of type CategoryField. It is valid syntax on other fields, but may raise errors or undefined behavior.

Parameter

Type

Default

Description

type*

string

One of text, boolean, number, datetime, or category, select, time_range.

field_id*

string

A unique identifier for the field, which should not change among tasks within a project.

title*

string

Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project. Must not be an empty string.

description

string

undefined

A brief description about what the response should be. This may change among tasks within a project.

hint

string

undefined

Longer explanation of why the field exists and how it should be used. Renders as a tooltip.

required

boolean

false

Determines whether or not a response for this field is required.

min_responses_required

integer

1

The minimum number of separate annotations allowed for this field. Must be larger than 0.

max_responses_required

integer

1

The maximum number of separate annotations allowed for this field. Must be larger than or equal to min_responses_required, with an upper bound of 100.

conditions

array_object

undefined

A set of conditions which must be satisfied for this field to be shown.

Additional Fields

See the TextField, BooleanField, NumberField, DatetimeField, and CategoryField sections.

Example

// Example of UnitField with conditions
{
  type: "category",
  field_id: "occlusion",
  title: "Is there occlusion in the image?",
  choices: [{label: 'None', value: '0' },
            {label: 'A little', value: '1'},
            {label: 'A lot', value: '2'}],
  conditions: [{}],
},
{
  type: "category",
  field_id: "occlusion_detail",
  title: "What is the cause of the occlusion?",
  choices: [{label: 'Rain', value: 'rain'},
            {label: 'Shadow', value: 'shadow'}],
  conditions: [{
    occlusion: ['1', '2'], // show if 1 or 2 are selected
    // equivalently {not: [[], ['0']}
    // equivalently [{not: []}, {not: ['0']}]
    // equivalently [['1'],['2']]
  }],
},
{
  type: "text",
  field_id: "a_lot_of_shadow",
  title: "Please describe why there is so much shadow.",
  conditions: [{
    // show if 2 and shadow are selected in their respective fields
    occlusion: ['2'], 
    occlusion_detail: ['shadow'],
  }],
},

TextField

Subclass of UnitField and returns a string response.

Example

{
  "type": "text",
  "field_id": "summary",
  "title": "Summary",
  "min_responses_required": 1,
  "max_responses_required": 3,
  "max_characters": 500,
  "required": true
}

Parameter

Type

Default

Description

max_characters

integer

undefined

The maximum number of characters allowed in the field.

BooleanField

Subclass of UnitField and returns a boolean response. Has no additional parameters.

Example

{
  "type": "boolean",
  "field_id": "availability",
  "title": "Item Availability",
  "description": "Choose true if available."
}

NumberField

Subclass of UnitField and returns a string response based on the annotated number.

Example

{
  "type": "number",
  "field_id": "item_price",
  "title": "Item Price",
  "description": "Leave empty if not applicable.",
  "required": false,
  "use_slider": true,
  "min": 0,
  "max": 100
}

Parameter

Type

Default

Description

use_slider

boolean

false

Set to true to use a slider instead of textbox.

min

float

undefined

Sets the minimum value of the slider.

max

float

undefined

Sets the maximum value of the slider.

step

float

undefined

Sets the step value of the slider.

DatetimeField

Subclass of UnitField and returns a DatetimeAnnotation response.

Definition: DatetimeSpec

An enum that consists of year, month, day, hour, and minute.

Definition: DatetimeAnnotation

An interface that contains optional number fields including year, month, day, hour, and minute.

Example

{
  "type": "datetime",
  "field_id": "release_date",
  "title": "Date of Product Release",
  "description": "Leave empty if not applicable.",
  "include": ["year", "month", "day"],
  "defaults": {
    "year": 2021,
    "month": 4,
    "day": 13
  }
}

Parameter

Parameter

Default

Description

include*

array

An array of DatetimeSpec elements. Must contain at least one element.

defaults

DatetimeAnnotation

{}

Default value for the return value.

CategoryField

Subclass of UnitField and returns an array of selected CategoryChoiceValue elements in its response.

CategoryChoice elements with subchoices are only used for navigation. The only selectable CategoryChoice elements are those with no subchoices.

Example

{
  "type": "category",
  "field_id": "genre",
  "title": "Select all genres that apply.",
  "choices": [
    {
      "label": "Hip-Hop/Rap",
      "value": "hip-hop-rap",
      "hint":
        "It consists of a stylized rhythmic music that commonly accompanies rapping, a rhythmic and rhyming speech that is chanted.",
      "subchoices": [
        { "label": "Dirty South", "value": "dirty-south" },
        { "label": "Industrial Hip Hop", "value": "industrial-hip-hop" },
        { "label": "Nerdcore", "value": "nerdcore" },
        { "label": "Rap", "value": "rap" },
      ]
    },
        {
      "label": "R&B/Soul",
      "value": "rb-soul",
      "subchoices": [
        { "label": "Disco", "value": "disco" },
        { "label": "Funk", "value": "funk" },
        { "label": "Motown", "value": "motown" },
      ]
        },
  ],
  "min_choices": 1,
  "max_choices": 5
}

Parameter

Type

Default

Description

choices*

array

An array of CategoryChoice elements to define the relevant choice.

min_choices

integer

1

Minimum number of choices to select.

max_choices

interer

1

Maximum number of choices to select. If this value is greater than 1, the form renders a checkbox. Otherwise, it renders a radio button.

CategoryChoice

Parameter

Type

Default

Description

label*

string

The label of the choice field. This description may change among tasks within a project.

value*

CategoryChoiceValue

The value of the choice field. Must be a string, number, or boolean.

hint

string

undefined

An array of CategoryChoice elements to define the relevant subchoices.

TimerangeField

Subclass of UnitField.

Example

{
  "type": "time_range",
  "field_id": "hours",
  "title": "Store Hours",
  "defaults_seconds": [
    28800,
    72000
  ],
  "increment_seconds": 300,
  "max_responses_required": 2, 
  "min_responses_required": 0
}

Parameter

Type

Default

Description

default_seconds*

array

Must have length 2, and be in range [0, 24 * 60 * 60]

increment_seconds

number

Must be between 1 and 60 * 60

default_from_field

string

Must be a valid field_id

SelectField

Subclass of UnitField.

Example

{
  "type": "select",
	"field_id": "sentiment",
  "title": "Sentiment",
  "description": "Choose a sentiment that best describes this text",
  "required": True,
  "choices_from_field": "Options",
}

Parameter

Type

Default

Description

choices

array

An array of selectable options, choices is not required if choices_from_field is present.

choices_from_field

string

Must be a valid field_id

RankingField

RankingField objects allow you to define task to rank task attachments.

Returns a list response with ordered options.

Example

{
	"type": "ranking_order",
  "field_id": "relevance_ranking",
  "title": "Rank titles based on their relevance to the article",
  "hint": "From the most relevant to the least one",
  "first_label": "Best",
  "last_label": "Worst",
  "num_items_to_rank": 3
}

Parameter

Type

Default

Description

title

string

undefined

A brief description about what the response should be. This may change among tasks within a project.

hint

string

undefined

An array of child UnitField and FieldSet objects. Must contain at least 2 elements.

first_label

string

undefined

Determines whether or not all .

last_label

string

undefined

num_items_to_rank

integer

3

The number of options required to rank (can be less than number of attachments).

required

booleanfalse

false

Determines whether or not all num_items_to_rank fields should filled.

FormField

FormField objects allow you to create several mini-forms associated with different attachments. These mini-forms will be populated by the object's child fields.

Returns a dictionary response with key-value pairs defined by its child fields.

📘Note

FormField objects can only be located on the top level of the fields task parameter. If one FormField object is used, all the other top-level objects must also be FormField objects.

Example

{
  "type": "form",
  "field_id": "form_query",
  "title": "Query Intention",
  "fields": [
    {
      "type": "text",
      "field_id": "query_intention",
      "title": "Query Intention",
      "hint": "Please investigate the search links."
    },
  ]
}

Parameter

Type

Default

Description

type*

string

For FormField Objects, this should be set to form

field_id*

string

A unique identifier for the field, which should not change among tasks within a project

title*

string

Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project.

description

string

undefined

A brief description about what the response should be. This may change among tasks within a project.

fields*

array

An array of child UnitField and FieldSet objects. Any FieldSet objects here must have incline set to true

Text Collection Callback Format

The response object, which is part of the callback POST request and permanently stored as part of the task object, will have an annotations field. The annotations object is a dictionary in which each key is a field_id defined in the task parameters and each value is the respective annotation for that field.

Each annotation will be of the type defined by its field above. If max_responses_required is applicable and greater than 1, the annotation will be an array of the type.

📘

See the Callback section for more details about callbacks.

Example

{
  "response": {
    "annotations": {
      "category_name": "Soup", //TextField
      "category_items": [ //FieldSet with max_responses_required greater than one
        {
          "item_name": "Tom Yum Chicken Soup", //TextField
          "item_price": "11.79" //NumberField
        },
        {
          "item_name": "Tom Yum Beef Soup", //TextField
          "item_price": "11.79" //NumberField
        }
      ],
      "category_metadata": { //FieldSet
        "gluten_friendly": true, //BooleanField
        "labels": [ //TextField with max_responses_required greater than one
          "Free Range", 
          "All Natural"
        ] 
      }
    }
  },
  "task_id": "5774cc78b01249ab09f089dd",
  "task": {
    // populated task for convenience
  }
}

Text Collection Hypothesis

When creating a textcollection task, you can provide prelabels in the hypothesis field, so that workers don't have to start from scratch to annotate the image.

In order to add pre-labels in a task using hypothesis, you’ll need to provide these in the hypothesis field of the payload when creating the task. The schema of the hypothesis object must match the schema of the task response.

  1. Verify the task response field schema for the desired task type.

  2. Review your project taxonomy (label names, attribute conditions, annotation types, etc).

  3. Generate pre-labels that are formatted to match the aforementioned schema and taxonomy.

  4. Create a task, including a hypothesis field that contains the pre-labels at the same top-level as other task fields such as project and instructions.

The hypothesis format will largely mirror Scale’s task response format. In this particular task type, annotations field is mandatory inside the hypothesis object.

The only difference between hypothesis and the response format is that inside every field you want to pre-annotate, you'll need to add two more field fields:

type describes the field type (category, select, text, etc.)
field_id describes the identification given to this field for tracking (field name)

You can find these two fields in your task taxonomy

Note: For Text types fields the response format differs from the other types. For this particular field type, response field will be an array of a single string instead of an array of arrays containing strings.

task_payload_with_hypothesis

{
 ...
 "batch": "regular_batch_name",
 "hypothesis": {
   "annotations": {
     "(EXAMPLE) Multiple Choice Question": {
       "type": "category",
       "field_id": "(EXAMPLE) Multiple Choice Question",
       "response": [
         [
           "B"
         ]
       ]
     }
   }
 },
 ...
}

task_taxonomy

{
   "fields": [
     {
       "type": "category",
       "field_id": "(EXAMPLE) Multiple Choice Question",
       "title": "Which option best fits this task?",
       "choices": [
         {
           "label": "A",
           "value": "A"
         },
         {
           "label": "B",
           "value": "B"
         },
         {
           "label": "C",
           "value": "C"
         }
       ],
       "min_choices": 1,
       "max_choices": 1,
       "description": "Select one of the following. "
     }
   ]
 }

task_payload_with_hypothesis_text_field

{
   ...
   "hypothesis": {
       "annotations": {
           "Product Description": {
               "type": "text",
               "field_id": "(EXAMPLE) Text Input Field",
               "response": [
                   "Dolore in dolor occaecat deserunt ex in qui non amet est."
               ]
           }
       }
   }
   ...
}

NamedEntityRecognitionLabel


NamedEntityRecognitionLabel objects define the taxonomy of labels to use to annotate spans of text.

NamedEntityRecognitionAttribute objects define form fields for individual annotations.

AttributeSelectOption objects define possible values for select attributes.

NamedEntityRecognitionLabel

Parameter

Type

Default

Description

name*

string

A unique identifier for this label.

display_name

string

name

An alias for this label to display to taskers.

description

string

undefined

A description of what this label should represent. Displayed to taskers to improve quality.

children

array_object

undefined

An array of NamedEntityRecognitionLabel objects to group underneath this label. Specifying this field causes this label itself to no longer be used for labeling text spans.

attributes (optional)

object

undefined

NamedEntityRecognitionAttribute

Parameter

Type

Description

type

string

Only 'select' for now.

options

array_object

List of select option objects.

display_name

string

Optional display name.

description

string

Optional description.

AttributeSelectOption

Parameter

Type

Description

value

string

The value that will show up in the response if this option is selected.

display_name

string

Optional display name if different from the value.

NamedEntityRecognitionRelationshipDefinition


NamedEntityRecognitionRelationshipDefinition objects specify the types of relationship that can exist between two text spans.

A relationship can either be named or unnamed. A named relationship is useful if you need to distinguish between multiple types of relationship that could exist between the same two text spans. For instance, if you're annotating a description of someone's family history, you might want to distinguish a "child of" relationship from a "sibling of" relationship.

A task can only specify one type of relationship. Either all the relationships in a task must be named, or all must be unnamed.

Parameter

Type

Default

Description

name

string

A unique identifier for this type of relationship. Required for named relationships; disallowed for unnamed relationships.

display_name

string

A description for this relationship to display to taskers. Should be able to be used to construct a short phrase describing the relationship. For example, a relationship between two text spans "A" and "B" with display_name "is parent of" would be rendered to taskers as "A is parent of B". Required for named relationships; disallowed for unnamed relationships.

is_directed

boolean

false

A field indicating whether the directionality of this relationship matters. For example, a "is parent of" relationship would likely be directed, whereas a "is sibling of" relationship would likely not be directed. Optional for named relationships; disallowed for unnamed relationships.

source_label

string

A string referencing the name field of a NamedEntityRecognitionLabel object. If set, mandates that the source text span of this field must be labeled with the corresponding NamedEntityRecognitionLabel, or one of its children. Optional for both named and unnamed relationships.

target_label

string

A string referencing the name field of a NamedEntityRecognitionLabel object. If set, mandates that the target text span of this field must be labeled with the corresponding NamedEntityRecognitionLabel, or one of its children. Optional for both named and unnamed relationships.

Named Entity Recognition Callback Format

The response object is part of the callback POST request and is permanently stored as part of the task object.

NamedEntityRecognitionResponse

The structure of a response object for named entity recognition consists of two arrays: one for entity annotations and another for relationships between these entities.

NamedEntityRecognitionAnnotation

The format for an individual entity annotation within the named entity recognition response, detailing the unique identifier, position, and content of the recognized text span, as well as its label and any optional attributes.

NamedEntityRecognitionRelationship

In tasks with undirected relationships, the source_ref and target_ref fields are interchangeable. In tasks with links that do not have relationship names, the name field will be left blank.

Example

{
  "annotations": [
    {
      "id": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
      "start": 10,
      "end": 17,
      "text": "Alex Wang",
      "label": "person",
    },
    {
      "id": "a76da53e-4ebd-4466-aed7-80db6fb98329",
      "start": 22,
      "end": 31,
      "text": "Transform",
      "label": "conference",
    }
  ],
  "relationships": [
    {
      "id": "ade8e9e9-ef9c-4fc7-9517-62d79a15c1cb",
      "source_ref": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
      "target_ref": "a76da53e-4ebd-4466-aed7-80db6fb98329",
      "name": "speaker_at",
    }
  ]
}

NamedEntityRecognitionResponse

Field

Type

Description

annotations

object array

List of NamedEntityRecogntionAnnotation objects.

relationships

object array

List of NamedEntityRecognitionRelationship objects.

NamedEntityRecognitionAnnotation

Field

Type

Description

id

string

Unique identifier.

start

number

Start index of the text span.

end

number

End index of the text span.

text

string

Text of the text span.

label

string

References the name field of a label in the task params.

attributes (optional)

object

The keys of the object reference keys of the attributes object for the corresponding label in the task params.

NamedEntityRecognitionRelationship

Field

Type

Description

id

string

Unique identifier.

source_ref

string

References the id of the annotation that is the source of the directed relationship.

target_ref

string

References the id of the annotation that is the target of the directed relationship.

name (optional)

string

References the name of relationship definitions in the task params.

Updated about 1 month ago