Filters for resources with custom fields

The resources with custom fields defined accepts a filter parameter in the index action. This is applicable both to api/v1/data/datasets and gobierto_investments/api/v1/projects.
The structure of a filter parameter is:

filter[CUSTOM_FIELD_UID][ [OPERATOR] ]=VALUE[ ,VALUE ... ][ &filter[CUSTOM_FIELD_UID][ [OPERATOR] ]=VALUE[ & ... ] ]

When no operator is included the eq operator is assumed.
Depending on the type of custom field the value must be:
- For custom fields of type vocabulary the value is expected to be a term id
- For custom fields of type date the value must have YYYY-MM-DD format
The available operators are:
- eq: Custom field value must equal the value.
- in: The values section can be a list of values separated by commas. The custom field value must be in the values list
- gt: Custom field value must be greater than the value.
- gteq: Custom field value must be greater or equal the value.
- lteq: Custom field value must be less or equal the value.
- lt: Custom field value must be less than the value.
- like: Custom field value is searched using the ilike SQL condition (case insensitive).

Filter Examples

Datasets with category (vocabulary type) associated to term with id 1 and cost of 750000:

GET /api/v1/data/datasets?filter[category]=1&filter[cost]=750000

Datasets with category associated to term with id 1:

GET /api/v1/data/datasets?filter[category][eq]=1

Datasets with start date between 2019/01/01 (included) and 2020/01/01 (not included):

GET /api/v1/data/datasets?filter[start-date][gteq]=2019-01-01&filter[start-date][lt]=2020-01-01

Datasets with cost of 100000 and 250000

GET /api/v1/data/datasets?filter[cost][in]=100000,250000

Datasets with description including the term culture:

GET /api/v1/data/datasets?filter[description][like]=%culture%

Create datasets

There are three ways to create/update a datasets loading data from a CSV, depending on how the CSV is obtained:

The CSV can be available remotely with a URL
The CSV can be available in a filesystem the API has access to
The CSV is in the client side and must be uploaded with the API request

The two first cases use an application/json body as many of the other API requests. The schema of the JSON to create a dataset is:

{
    "data": {
        "type": "gobierto_data-dataset_forms",
        "attributes": {
            "name": ···,
            "table_name": ···,
            "slug": ···,
            "data_path": ···,
            "local_data": ···,
            "csv_separator": ···,
            "schema": ···
        }
    }
}

The attributes are described in this guide

Examples:

Suppose the admin has XXXXXXXXXX token

From URL

For example, we want to create a dataset loading data from https://gishubdata.nd.gov/sites/default/files/NDHUB.Roads_MileMarkers_1.csv. This csv has comma as separator by default, so we don't need to provide a csv_separator option (the default is ",")

curl --location --request POST 'https://···/api/v1/data/datasets' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer XXXXXXXXXX' \
--data-raw '{
    "data": {
        "type": "gobierto_data-dataset_forms",
        "attributes": {
            "name": "Example 1",
            "table_name": "example_1_records",
            "data_path": "https://gishubdata.nd.gov/sites/default/files/NDHUB.Roads_MileMarkers_1.csv",
            "local_data": false
        }
    }
}'

From URL, using schema, csv_separator and defining a slug

The previous request doesn't use schema or csv_separator options. Once inspected the content of the CSV we have a guess of the data types and we decide to include a schema for the columns with types other than text:

curl --location --request POST 'https://···/api/v1/data/datasets' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer XXXXXXXXXX' \
--data-raw '{
    "data": {
        "type": "gobierto_data-dataset_forms",
        "attributes": {
            "name": "Example 2",
            "table_name": "example_2_records",
            "slug": "example-2-slug",
            "data_path": "https://gishubdata.nd.gov/sites/default/files/NDHUB.Roads_MileMarkers_1.csv",
            "local_data": false,
            "csv_separator": ",",
            "schema": {
                "objectid": {
                    "original_name": "OBJECTID",
                    "type": "integer"
                },
                "hwy": {
                    "original_name": "HWY",
                    "type": "integer"
                },
                "created_date": {
                    "original_name": "CREATED_DATE",
                    "type": "date",
                    "optional_params": {
                        "date_format": "YYYYMMDDHH24MISS"
                    }
                },
                "last_edited_date": {
                    "original_name": "LAST_EDITED_DATE",
                    "type": "date",
                    "optional_params": {
                        "date_format": "YYYYMMDDHH24MISS"
                    }
                },
                "route_id_rims": {
                    "original_name": "ROUTE_ID_RIMS",
                    "type": "integer"
                },
                "fromdate": {
                    "original_name": "FROMDATE",
                    "type": "date",
                    "optional_params": {
                        "date_format": "YYYYMMDD"
                    }
                },
                "todate": {
                    "original_name": "TODATE",
                    "type": "date",
                    "optional_params": {
                        "date_format": "YYYYMMDD"
                    }
                },
                "measure": {
                    "original_name": "MEASURE",
                    "type": "numeric"
                }
            }
        }
    }
}'

From a local file

Suppose that the API is installed in a server which has a csv in its filesystem at path /home/ubuntu/2008_10k.csv. We can load the csv into a new dataset changing the attribute local_data to true:

curl --location --request POST 'https://···/api/v1/data/datasets' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer XXXXXXXXXX' \
--data-raw '{
   "data": {
       "type": "gobierto_data-dataset_forms",
       "attributes": {
           "name": "Example 3",
           "table_name": "example_3_records",
           "data_path": "/home/ubuntu/2008_10k.csv",
           "local_data": true
       }
   }
}'

Update datasets

From a local file, replacing all content

If we want to load new data and regenerate the table this is the default behaviour of the update action. The endpoint is the same as used for show. If our dataset has example-3 slug:

curl --location --request PUT 'https://···/api/v1/data/datasets/example-3' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer XXXXXXXXXX' \
--data-raw '{
    "data": {
        "type": "gobierto_data-dataset_forms",
        "attributes": {
            "data_path": "/home/ubuntu/2008_10k.csv",
            "local_data": true
        }
    }
}'

From a local file, appending content

Appends all content from the file to the table.

You have to set the attribute append to true (false by default).

This operation will raise an error if the schema of the CSV is not compatible with the previously existing data.

curl --location --request PUT 'https://···/api/v1/data/datasets/example-3' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer XXXXXXXXXX' \
--data-raw '{
    "data": {
        "type": "gobierto_data-dataset_forms",
        "attributes": {
            "data_path": "/home/ubuntu/2008_100k.csv",
            "local_data": true,
            "append": true
        }
    }
}'

Using multipart/form-data

Create a dataset uploading a CSV using multipart/form-data

The multipart/form-data format allows us to upload CSVs an schema files directly. Suppose that there is a local CSV with path /local/path/file.csv. In this kind of request the same options of the previous examples can be sent under dataset[attribute_name], with the exception of data_path and local_data which not make sense in this context. The file location must be sent with dataset[data_file]:

curl --location --request POST 'https://···/api/v1/data/datasets' \
--header 'Authorization: Bearer XXXXXXXXXX' \
--header 'Content-Type: multipart/form-data' \
--form 'dataset[data_file]=@/local/path/file.csv' \
--form 'dataset[name]=Example 4' \
--form 'dataset[table_name]=example_4_records' \
--form 'dataset[slug]=example-4-slug' \
--form 'dataset[csv_separator]=,'

Update a dataset uploading a CSV and a schema file using multipart/form-data

Suppose that we have the schema in /local/path/schema.json and the data in /local/path/file.csv and we want to append data to the dataset of the previous example:

curl --location --request PUT 'https://mataro.gobierto.test/api/v1/data/datasets/example-5-superslug' \
--header 'Authorization: Bearer XXXXXXXXXX' \
--header 'Content-Type: multipart/form-data' \
--form 'dataset[data_file]=@/local/path/file.csv' \
--form 'dataset[schema_file]=@/local/path/schema.json' \
--form 'dataset[append]=true' \
--form 'dataset[csv_separator]=,'

The schema can also be passed as a string containing a JSON with dataset[schema]. If both dataset[schema_file] and dataset[schema] are sent, the second option is ignored and the schema is taken from the uploaded file.