Gobierto Data / CSV Schema

When importing a dataset using the Gobierto Data API, the schema of the data being imported is informed with a schema file.

The schema file is a JSON containing information of the type of each column in the CSV.

If no schema is provided, all columns are considered text.

Example

Suppose that the CSV has this form:

DATE,TIMESTAMP,WEEK DAY, DISTRICT NUMBER,AGE,PRICE,STOCK
31/12/2018,2018-12-31 16:30,MONDAY,3,34,27.56,YES
1/12/2018,2018-12-1 07:15,SATURDAY,5,27,200.4,NO

If this file is sent to create a dataset automatically the following schema is generated:

{
    "date": {
        "original_name": "DATE",
        "type": "text"
    },
    "timestamp": {
        "original_name": "TIMESTAMP",
        "type": "text"
    },
    "week_day": {
        "original_name": "WEEK DAY",
        "type": "text"
    },
    "district_number": {
        "original_name": "DISTRICT NUMBER",
        "type": "text"
    },
    "age": {
        "original_name": "AGE",
        "type": "text"
    },
    "price": {
        "original_name": "PRICE",
        "type": "text"
    },
    "stock": {
        "original_name": "STOCK",
        "type": "text"
    }
}

Each column of the csv has an entry in the schema:

  • The key represent the column name in the destination database table
  • The value is an object with different attributes:
    • original_name: Mandatory - The name of the column in the original CSV file
    • type: Mandatory - The type of the column in the CSV file. A type has associated a data type in the database and possibly a set of transformation functions that are performed on the load process. Currently there are 6 types defined:
      • integer: Destination data type is integer (from -2147483648 to +2147483647).
      • bigint: Destination data type is bigint (from -9223372036854775808 to 9223372036854775807).
      • numeric: Destination data type is numeric.
      • numeric_with_custom_decimal_separator: Destination type is numeric.
      • text: Destination type is text.
      • date: Destination type is date.
      • time: Destination type is time.
      • timestamp: Destination type is timestamp without timezone. The default expected format is "YYY-MM-DD HH24:MI:SS"
      • timestamptz: Destination type is timestamp with timezone. The timezone used is the timezone configured for database. The default expected format is "YYY-MM-DD HH24:MI:SS"
      • boolean: Destination type is boolean.
    • optional_params: Each type can have a list of settings. If omitted, default values are used. All types share a null_value option which is "" by default. The value of this attribute will be transformed to NULL on load. There are other specific optional params on some types:
      • numeric_with_custom_decimal_separator:
        • decimal_separator: String. Default: ",". Character which represents the decimal separator.
      • date, timestamp and timestamptz
        • date_format: String. Default: "YYY-MM-DD HH24:MI:SS". String which represents the date format, according to the to_date function documentation
      • time
        • time_format: String. Default: "HH24:MI". String which represents the time format, according to the to_timestamp function documentation
      • boolean:
        • false_value: String. Default: "0". String which has to be interpreted as false.
        • true_value: String. Default: "1". String which has to be interpreted as true.

For our example this could be a good schema JSON:

{
    "date": {
        "original_name": "DATE",
        "type": "date",
        "optional_params": { "date_format": "DD/MM/YYYY" }
    },
    "timestamp": {
        "original_name": "TIMESTAMP",
        "type": "timestamptz",
        "optional_params": { "date_format": "YYY-MM-DD HH24:MM" }
    },
    "week_day": {
        "original_name": "WEEK DAY",
        "type": "text"
    },
    "district_number": {
        "original_name": "DISTRICT NUMBER",
        "type": "integer"
    },
    "age": {
        "original_name": "AGE",
        "type": "integer"
    },
    "price": {
        "original_name": "PRICE",
        "type": "numeric"
    },
    "stock": {
        "original_name": "STOCK",
        "type": "boolean",
        "optional_params": { "false_value": "NO", "true_value": "YES" }
    }
}