Skip to content Skip to sidebar Skip to footer

Creating A New Table And Setting The Expiration Date In Bigquery Using Python

This is my code that pulls the realtime database from firebase, formats it in a Json, uploads to the cloud and then to BQ. #standardsql import json import boto import gcs_oauth2_b

Solution 1:

If you want to set an expiration time for your table, this might do the trick:

from datetime import datetime, timedelta
from google.cloud.bigquery.schema import SchemaField

def load_data_from_gcs(dataset,
                   table_name,
                   table_schema,
                   source,
                   source_format,
                   expiration_time):
    bigquery_client = bigquery.Client()
    dataset = bigquery_client.dataset(dataset)
    table = dataset.table(table_name)
    table.schema = table_schema
    table.expires = expiration_time
    if not table.created:
        table.create()

    job_name = str(uuid.uuid4())
    job= bigquery_client.load_table_from_storage(
        job_name, table, source)
    job.source_format = source_format

    job.begin()
    wait_for_job(job)

dataset = 'FirebaseArchive'
table_name = 'test12'
gcs_source = 'gs://dataworks-356fa-backups/firetobq.json'
source_format = 'NEWLINE_DELIMITED_JSON'
table.schema = [SchemaField(field1), SchemaField(field2), (...)]
expiration_time = datetime.now() + timedelta(seconds=604800)

load_data_from_gcs(dataset,
                   table_name,
                   table_schema,
                   gcs_source,
                   source_format,
                   expiration_time)

Notice the only difference is the lines of code where it sets:

table.expires = expiration_time

Whose value must be of type datetime (here defined as expiration_time = datetime.now() + timedelta(seconds=604800))

Not sure if it's possible to use schema auto-detection using the Python API but you still can send this information using the SchemaFields. For instance, if your table have two fields, user_id and job_id, both being INTEGERS, then the schema would be:

table_schema = [SchemaField('user_id', field_type='INT64'),
                SchemaField('job_id', field_type='INT64')]

More information on how schema works in BigQuery you can find here.

[EDIT]:

Just saw your other question, if you want to truncate the table and then write data to it, you can just do:

job.create_disposition = 'WRITE_TRUNCATE'
job.begin()

In your load_data_from_gcs function. This will automatically delete the table and create a new one with the data from your storage file. You won't have to define a schema for that as it's already previously defined (therefore might be a much easier solution for you).

Post a Comment for "Creating A New Table And Setting The Expiration Date In Bigquery Using Python"