Icon Crear Crear

Mastering Hive Table Design

Completar frases

Implement and get drilled on Hive Table design problems.

Descarga la versión para jugar en papel

11 veces realizada

Creada por

Estados Unidos

Top 10 resultados

  1. 1
    19:03
    tiempo
    69
    puntuacion
¿Quieres aparecer en el Top 10 de este juego? para identificarte.
Crea tu propio juego gratis desde nuestro creador de juegos
Compite contra tus amigos para ver quien consigue la mejor puntuación en esta actividad

Top juegos

  1. tiempo
    puntuacion
  1. tiempo
    puntuacion
tiempo
puntuacion
tiempo
puntuacion
 
game-icon

Completar frases

Mastering Hive Table DesignVersión en línea

Implement and get drilled on Hive Table design problems.

por Good Sam
1

department id CREATE FIELDS TERMINATED BY ROW STORED AS employees TABLE ',' DELIMITED STRING TEXTFILE STRING name FORMAT age INT INT

Practice Problem #1 - Create a simple Hive Table :

Create a table named employees with four columns ( id , name , age , department ) . The ROW FORMAT DELIMITED clause specifies how Hive should interpret data to fit into this table schema .

(
,
,
,

)


;

2

activity_details 32 BUCKETS TABLE CLUSTERED timestamp user_id activity_type ORC user_id AS STRING STRING CREATE STORED LOCATION BY user_activity_logs '/path/to/user/activity/logs' PARTITIONED BY INTO INT BIGINT

Practice Problem #2 - Design a Hive Table :


Let's say you're given a dataset containing user activity logs with fields : timestamp , user_id , activity_type , and activity_details . Design a Hive table to store this data , partitioned by activity_type and optimized for querying by user_id .

(
,
,

)
( )
( )

;

3

STRING product_reviews STORED LOCATION review_text STRING '/path/to/product/reviews' TABLE PARTITIONED user_id INT AS BY review_date INT EXTERNAL product_id INT rating review_id ORC INT CREATE

Practice Problem #3 :


Given a dataset of product reviews with fields : review_id , product_id , review_text , user_id , rating , and review_date ( in YYYY - MM - DD format ) , design a Hive table to store this data , optimized for querying reviews by product and date . Think about how you would partition and store the table .

(
,
,
,

)
(
,

)

;

4

daily_transactions INT AS INT STORED CREATE transaction_id PARTITIONED BY DATE PARQUET user_id TABLE 10 transaction_date transaction_amount 2 DECIMAL

Practice Problem #4 - Daily Transaction Logs : Design a Hive table for the scenario


Scenario : You have daily transaction logs containing transaction_id , user_id , transaction_amount , and transaction_date .

(
,
,
( , )
)
( )
;

5

LOCATION login_id AS login_history_staging login_history logout_timestamp CREATE user_id login_timestamp AS PARTITION STORED user_id logout_timestamp ORC FROM INT login_history_staging INT login_timestamp TABLE BY login_id TIMESTAMP INTO AS PARTITIONED login_timestamp SELECT login_month login_id login_month TABLE EXTERNAL INT user_id login_month TIMESTAMP CREATE '/path/to/login/history' ORC INSERT TABLE TIMESTAMP login_history date_format INT STORED login_timestamp 'yyyy-MM' logout_timestamp STRING TIMESTAMP

Practice Problem #5 - User Login History : Design a Hive table for the scenario


Scenario : Track user login history with login_id , user_id , login_timestamp , and logout_timestamp , optimizing for queries on monthly login activity .

Solution :

- - Staging table creation
(
,
,
,

)

;

- - Main table creation with partitioning
(
,
,
,

)
( )
;

- - Data insertion from staging to main table
( )

,
,
,
,
( , )
;

6

product_id STORED product_inventory DATE INT INT ORC CREATE '/path/to/inventory' TABLE EXTERNAL store_location inventory_count last_update_date PARTITIONED BY LOCATION AS STRING

Practice Problem #6 - Product Inventory : Design a Hive table for the scenario


Scenario : Store product inventory records including product_id , store_location , inventory_count , and last_update_date , optimized for querying inventory by location .

Solution :

(
,
,

)
( )

;

7

DATE INT INT CREATE customer_feedback STORED AS category STRING customer_id received_date STRING message TABLE feedback_id PARTITIONED BY TEXTFILE

Practice Problem #7 - Customer Feedback Messages : Design a Hive table for the scenario


Scenario : Manage customer feedback with feedback_id , customer_id , message , category , and received_date , optimized for reviewing feedback by category and date .

Solution :

(
,
,

)
( , )
;

8

ORC sale_id DECIMAL region sale_amount AS product_id DATE sale_date INT CREATE sales_records PARTITIONED 10 TABLE 2 INT STRING BY STORED

Practice Problem #8 - Sales Records with Geography : Design a Hive table for the scenario


Scenario : Analyze sales records with sale_id , product_id , sale_amount , sale_date , and region , needing frequent access by region and specific dates .

(
,
,
( , )
)
( , )
;

9

transaction_type financial_transactions PARTITIONED BY 100 DECIMAL INT 10,2 account_id CLUSTERED BY transaction_date INT account_id STRING DATE amount TABLE STORED AS transaction_id INTO BUCKETS PARQUET CREATE

Problem #9 : Financial Transactions ( Parquet )

Scenario : You are tasked with managing a dataset of financial transactions that includes transaction_id , account_id , amount , transaction_type , and transaction_date . You need efficient querying by account_id and transaction_date .

Solution :

(
,
,
( ) ,

)
( )
( )
;

10

PARTITIONED BY LOCATION CREATE INT EXTERNAL INT AS signup_date STRING customer_profiles STORED customer_id name year AVRO TABLE '/path/to/customer/profiles' email STRING DATE

Problem #10 : Customer Profiles ( Avro )
Scenario : You need to store customer profile data including customer_id , name , email , signup_date , and last_login . The data must support evolving schemas as new fields might be added in the future .

Solution :

(
,
,
,

)
( )

;

11

STRING TABLE DATE event_details event_type STORED AS STRING INT user_id ORC event_logs PARTITIONED BY CREATE INT event_date event_id

Problem #11 : Event Logs ( Orc )
Scenario : Design a table to manage web event logs with fields : event_id , user_id , event_type , event_details , and event_date . You expect frequent complex queries involving multiple fields .

Solution :

(
,
,
,

)
( )
;

12

JSON BY campaign_id STRING STORED AS PARTITIONED 10,2 budget EXTERNAL marketing_campaigns CREATE '/path/to/marketing/campaigns' start_year LOCATION DECIMAL INT TABLE campaign_name INT

Problem #12 : Marketing Campaign Data ( JSON )
Scenario : Store marketing campaign data including campaign_id , campaign_name , start_date , end_date , and budget . The data is occasionally queried by marketing analysts who prefer readable format for ad - hoc queries .

Solution :

(
,
,
( )
)
( )

;

13

STORED AS PARTITIONED BY record_id TABLE INT data researcher_id STRING STRING INT entry_date CREATE research_data TEXTFILE study_field DATE

Problem #13 : Research Data ( TEXTFILE )
Scenario : Store research data including record_id , researcher_id , study_field , data , and entry_date . Data is primarily textual and occasionally accessed .

Solution :

(
,
,
,

)
( )
;

14

department_id department_id departments pk_dept REFERENCES KEY ORC TABLE PRIMARY CREATE CREATE pk_user STORED user_id department_id STRING department_id user_id CONSTRAINT department_name departments users user_name INT KEY PRIMARY STORED AS STRING TABLE CONSTRAINT INT CONSTRAINT department_id INT KEY FOREIGN fk_dept AS ORC

Problem #14 : Implementing Constraints
Scenario : Design a table to store user information with a unique user_id and a reference to a department_id from a departments table .

Solution :

(
,
,
( )
) ;

(
,
,
,
( ) ,
( ) ( )
) ;

15

COLUMNS price price 10 ALTER products INT TABLE COLUMN category_id DECIMAL 2 ADD products TABLE CHANGE ALTER

Problem #15 : Table Schema Modification
Scenario : You already have a products table and need to add a new column category_id and change the data type of the existing price column .

Solution :

( ) ;
( , ) ;

16

FROM sales_amount BY category_id INSERT TABLE OVERWRITE sales GROUP SELECT sales_summary category_id AVG

Problem #16 : Hive SQL Query
Scenario : Calculate and update the average sales for each product category in a sales_summary table .

Solution :


, ( )

;

17

LOAD DATA '/path/to/transactions.csv' transactions INPATH TABLE INTO

Problem #17 : Loading Data into Hive Table
Scenario : Load data into a transactions table from a CSV file located in HDFS .

Solution :

;

18

departments s.department_id = d.department_id sales AS GROUP d.department_name total_sales s.amount SELECT ON FROM BY d.department_name JOIN SUM

Problem #18 : Filtering , Aggregation , and Join
Scenario : Retrieve the total sales by department from a sales table and a departments table .

Solution :

, ( )
s
d
;

19

transaction_date AS FROM BY daily_total TEMPORARY transaction_date AS amount SELECT GROUP TABLE sales CREATE SUM temp_daily_sales

Problem #19 : Temporary Tables
Scenario : Create a temporary table to hold daily sales data for analysis within a session .

Solution :


, ( )

;

20

customers AS customer_name CREATE FROM customer_demographics VIEW region age SELECT

Problem #20 : Creating and Using Views
Scenario : Create a view to simplify access to customer demographics data without exposing sensitive details like personal IDs or payment methods .

Solution :


, ,
;

21

CREATE url' TABLE INT name STRING 'hdfs id AVRO TBLPROPERTIES 'avro AS path/to/schema/file' schema STORED

Problem #21 : Configuring Schema Evolution for Avro

1 . Avro
Avro format supports schema evolution out of the box with Hive . When using Avro , the schema is stored with the data , which helps Hive manage changes seamlessly . However , to explicitly enable and manage Avro schema evolution , you can use table properties like the following :

avro_table (
,

)

( . . = : / / ) ;

22

ORC split schema INT schema evolution strategy evolution hive id STRING 'true' AS SET orc exec 'orc case CREATE allowed' exec first_name sensitive' 'orc TABLE STORED orc SET 'false' ETL renames hive true TBLPROPERTIES column

Problem #22 : Configuring Schema Evolution for ORC

ORC supports schema evolution through its columnar format and metadata storage capabilities . To manage schema changes , you might need to adjust the following Hive configuration settings :

. . . . = ;
. . . . = ;

hive . exec . orc . split . strategy : Setting this to ETL optimizes reading of ORC files that might have evolved schemas .

hive . exec . orc . schema . evolution : Enabling this allows Hive to handle changes in the ORC file schemas over time .

Additionally , when creating ORC tables , consider enabling column renaming as part of schema evolution :

orc_table (
,

)

( . . . . = , . . . = ) ;

23

true PARQUET name dictionary STORED CREATE SET parquet enable STRING TABLE INT id AS

Problem #23 : Configuring Schema Evolution for PARQUET

Parquet also supports schema evolution to a degree , especially with additions of new columns . To use Parquet effectively with schema evolution in Hive , ensure that your Hive version and settings align with Parquet ? s capabilities :

parquet_table (
,

)
;

For schema evolution in Parquet , the changes are mostly handled transparently by Hive , but you can ensure better management with configurations like :

. . = ;

educaplay suscripción