Published on March 3, 2022 by Kevin Graham
The Glue connector is a metadata connector, which is used for querying and creating tables in AWS Glue. When you create an external table with this connector, if you give it the name of a table name already in Glue. The connector finds out the table’s column types, data location and storage format. The Kognitio external table is then created with the appropriate column types, attributes and data connector. Alternatively, you can create a writable Kognitio external table with no corresponding table in Glue. In this case the table will also be created in Glue.
See external table connectors for details of whether your deployment option supports this connector.
To create a connector for the Glue data catalog on your cluster you just need to make sure the glue module is active and then create a connector:
create module java mode active; create connector myglue source java target 'class com.kognitio.javad.glueconnector.GlueConnector';
Note: On Kognitio on AWS systems a connector called GLUE connected to your default catalog is created by default.
The create external table command for use with the Glue connector accepts the following specific options – see Create External Table for details of syntax and standard options.
The Glue connector queries the Glue catalog to get the column definitions for the table so they don’t need to be specified in the create table statement.
Example: Creating a connector and external table to read from a Glue table
To run the examples below you need to have permissions to create connectors and external tables, have built the simple Glue tables and have created a “test” schema on kognitio.
The Glue connector uses the java plugin so if the plugin is not already loaded and active run:
create module java mode active;
Then create the connector:
create connector myglueconnector source java target ' class com.kognitio.javad.glueconnector.GlueConnector ';
This creates a connector that will read tables in the Glue database on this cluster.
Create the Kognitio external table referencing the Glue table. You do not need to specify the columns as they will be picked up from the Glue catalog:
create external table test.glue_elb_logs from myglueconnector target 'table test.elb_logs';
You can then test this with:
select top 100 * from test.glue_elb_logs;
If you want to run multiple queries against the table, you should create a view image and query that:
create view test.v_glue_elb_logs as select * from test.glue_elb_logs; create view image test.v_glue_elb_logs; select elb_name, count(*) from test.v_glue_elb_logs group by 1;
Currently, the data formats recognised are delimited text, ORC, Parquet and Avro.
Most Glue column types are represented by a similar type in Kognitio. The Glue BOOLEAN type maps to a Kognitio TINYINT, using 0 for false and 1 for true. Only the scalar types are supported. Complex types such as ARRAY, MAP and STRUCT are not supported.
If AWS Glue indicates that a table is partitioned, Kognitio expects the data files to be laid out according to Hive’s partitioning scheme. This means a directory with a name of the form colname=value is assumed to contain rows in which the value of the column colname is value, and the column colname should not appear in the file.
If the Glue table’s data is in Avro format, and an Avro reader schema is given by Glue for that table, the Glue connector will not use Glue’s list of column types; instead it will pass the reader schema to the data connector and the data connector is expected to infer the column types from that.
As with all metadata connectors, in order to create an external table from the Glue connector, you only need privileges on the Glue connector itself, not on the data connector (e.g. the HDFS or Parquet connector) to which it delegates the table.
Speed up your WordPress site today by moving to WP Bolt.