Completar frases
Implement and get drilled on Hive Table design problems.
1
department
id
CREATE
FIELDS TERMINATED BY
ROW
STORED AS
employees
TABLE
','
DELIMITED
STRING
TEXTFILE
STRING
name
FORMAT
age
INT
INT
Practice
Problem
#1
-
Create
a
simple
Hive
Table
:
Create
a
table
named
employees
with
four
columns
(
id
,
name
,
age
,
department
)
.
The
ROW
FORMAT
DELIMITED
clause
specifies
how
Hive
should
interpret
data
to
fit
into
this
table
schema
.
(
,
,
,
)
;
2
activity_details
32
BUCKETS
TABLE
CLUSTERED
timestamp
user_id
activity_type
ORC
user_id
AS
STRING
STRING
CREATE
STORED
LOCATION
BY
user_activity_logs
'/path/to/user/activity/logs'
PARTITIONED BY
INTO
INT
BIGINT
Practice
Problem
#2
-
Design
a
Hive
Table
:
Let's
say
you're
given
a
dataset
containing
user
activity
logs
with
fields
:
timestamp
,
user_id
,
activity_type
,
and
activity_details
.
Design
a
Hive
table
to
store
this
data
,
partitioned
by
activity_type
and
optimized
for
querying
by
user_id
.
(
,
,
)
(
)
(
)
;
3
STRING
product_reviews
STORED
LOCATION
review_text
STRING
'/path/to/product/reviews'
TABLE
PARTITIONED
user_id
INT
AS
BY
review_date
INT
EXTERNAL
product_id
INT
rating
review_id
ORC
INT
CREATE
Practice
Problem
#3
:
Given
a
dataset
of
product
reviews
with
fields
:
review_id
,
product_id
,
review_text
,
user_id
,
rating
,
and
review_date
(
in
YYYY
-
MM
-
DD
format
)
,
design
a
Hive
table
to
store
this
data
,
optimized
for
querying
reviews
by
product
and
date
.
Think
about
how
you
would
partition
and
store
the
table
.
(
,
,
,
)
(
,
)
;
4
daily_transactions
INT
AS
INT
STORED
CREATE
transaction_id
PARTITIONED BY
DATE
PARQUET
user_id
TABLE
10
transaction_date
transaction_amount
2
DECIMAL
Practice
Problem
#4
-
Daily
Transaction
Logs
:
Design
a
Hive
table
for
the
scenario
Scenario
:
You
have
daily
transaction
logs
containing
transaction_id
,
user_id
,
transaction_amount
,
and
transaction_date
.
(
,
,
(
,
)
)
(
)
;
5
LOCATION
login_id
AS
login_history_staging
login_history
logout_timestamp
CREATE
user_id
login_timestamp
AS
PARTITION
STORED
user_id
logout_timestamp
ORC
FROM
INT
login_history_staging
INT
login_timestamp
TABLE
BY
login_id
TIMESTAMP
INTO
AS
PARTITIONED
login_timestamp
SELECT
login_month
login_id
login_month
TABLE
EXTERNAL
INT
user_id
login_month
TIMESTAMP
CREATE
'/path/to/login/history'
ORC
INSERT
TABLE
TIMESTAMP
login_history
date_format
INT
STORED
login_timestamp
'yyyy-MM'
logout_timestamp
STRING
TIMESTAMP
Practice
Problem
#5
-
User
Login
History
:
Design
a
Hive
table
for
the
scenario
Scenario
:
Track
user
login
history
with
login_id
,
user_id
,
login_timestamp
,
and
logout_timestamp
,
optimizing
for
queries
on
monthly
login
activity
.
Solution
:
-
-
Staging
table
creation
(
,
,
,
)
;
-
-
Main
table
creation
with
partitioning
(
,
,
,
)
(
)
;
-
-
Data
insertion
from
staging
to
main
table
(
)
,
,
,
,
(
,
)
;
6
product_id
STORED
product_inventory
DATE
INT
INT
ORC
CREATE
'/path/to/inventory'
TABLE
EXTERNAL
store_location
inventory_count
last_update_date
PARTITIONED BY
LOCATION
AS
STRING
Practice
Problem
#6
-
Product
Inventory
:
Design
a
Hive
table
for
the
scenario
Scenario
:
Store
product
inventory
records
including
product_id
,
store_location
,
inventory_count
,
and
last_update_date
,
optimized
for
querying
inventory
by
location
.
Solution
:
(
,
,
)
(
)
;
7
DATE
INT
INT
CREATE
customer_feedback
STORED AS
category
STRING
customer_id
received_date
STRING
message
TABLE
feedback_id
PARTITIONED BY
TEXTFILE
Practice
Problem
#7
-
Customer
Feedback
Messages
:
Design
a
Hive
table
for
the
scenario
Scenario
:
Manage
customer
feedback
with
feedback_id
,
customer_id
,
message
,
category
,
and
received_date
,
optimized
for
reviewing
feedback
by
category
and
date
.
Solution
:
(
,
,
)
(
,
)
;
8
ORC
sale_id
DECIMAL
region
sale_amount
AS
product_id
DATE
sale_date
INT
CREATE
sales_records
PARTITIONED
10
TABLE
2
INT
STRING
BY
STORED
Practice
Problem
#8
-
Sales
Records
with
Geography
:
Design
a
Hive
table
for
the
scenario
Scenario
:
Analyze
sales
records
with
sale_id
,
product_id
,
sale_amount
,
sale_date
,
and
region
,
needing
frequent
access
by
region
and
specific
dates
.
(
,
,
(
,
)
)
(
,
)
;
9
transaction_type
financial_transactions
PARTITIONED BY
100
DECIMAL
INT
10,2
account_id
CLUSTERED BY
transaction_date
INT
account_id
STRING
DATE
amount
TABLE
STORED AS
transaction_id
INTO
BUCKETS
PARQUET
CREATE
Problem
#9
:
Financial
Transactions
(
Parquet
)
Scenario
:
You
are
tasked
with
managing
a
dataset
of
financial
transactions
that
includes
transaction_id
,
account_id
,
amount
,
transaction_type
,
and
transaction_date
.
You
need
efficient
querying
by
account_id
and
transaction_date
.
Solution
:
(
,
,
(
)
,
)
(
)
(
)
;
10
PARTITIONED BY
LOCATION
CREATE
INT
EXTERNAL
INT
AS
signup_date
STRING
customer_profiles
STORED
customer_id
name
year
AVRO
TABLE
'/path/to/customer/profiles'
email
STRING
DATE
Problem
#10
:
Customer
Profiles
(
Avro
)
Scenario
:
You
need
to
store
customer
profile
data
including
customer_id
,
name
,
email
,
signup_date
,
and
last_login
.
The
data
must
support
evolving
schemas
as
new
fields
might
be
added
in
the
future
.
Solution
:
(
,
,
,
)
(
)
;
11
STRING
TABLE
DATE
event_details
event_type
STORED AS
STRING
INT
user_id
ORC
event_logs
PARTITIONED BY
CREATE
INT
event_date
event_id
Problem
#11
:
Event
Logs
(
Orc
)
Scenario
:
Design
a
table
to
manage
web
event
logs
with
fields
:
event_id
,
user_id
,
event_type
,
event_details
,
and
event_date
.
You
expect
frequent
complex
queries
involving
multiple
fields
.
Solution
:
(
,
,
,
)
(
)
;
12
JSON
BY
campaign_id
STRING
STORED AS
PARTITIONED
10,2
budget
EXTERNAL
marketing_campaigns
CREATE
'/path/to/marketing/campaigns'
start_year
LOCATION
DECIMAL
INT
TABLE
campaign_name
INT
Problem
#12
:
Marketing
Campaign
Data
(
JSON
)
Scenario
:
Store
marketing
campaign
data
including
campaign_id
,
campaign_name
,
start_date
,
end_date
,
and
budget
.
The
data
is
occasionally
queried
by
marketing
analysts
who
prefer
readable
format
for
ad
-
hoc
queries
.
Solution
:
(
,
,
(
)
)
(
)
;
13
STORED AS
PARTITIONED BY
record_id
TABLE
INT
data
researcher_id
STRING
STRING
INT
entry_date
CREATE
research_data
TEXTFILE
study_field
DATE
Problem
#13
:
Research
Data
(
TEXTFILE
)
Scenario
:
Store
research
data
including
record_id
,
researcher_id
,
study_field
,
data
,
and
entry_date
.
Data
is
primarily
textual
and
occasionally
accessed
.
Solution
:
(
,
,
,
)
(
)
;
14
department_id
department_id
departments
pk_dept
REFERENCES
KEY
ORC
TABLE
PRIMARY
CREATE
CREATE
pk_user
STORED
user_id
department_id
STRING
department_id
user_id
CONSTRAINT
department_name
departments
users
user_name
INT
KEY
PRIMARY
STORED AS
STRING
TABLE
CONSTRAINT
INT
CONSTRAINT
department_id
INT
KEY
FOREIGN
fk_dept
AS
ORC
Problem
#14
:
Implementing
Constraints
Scenario
:
Design
a
table
to
store
user
information
with
a
unique
user_id
and
a
reference
to
a
department_id
from
a
departments
table
.
Solution
:
(
,
,
(
)
)
;
(
,
,
,
(
)
,
(
)
(
)
)
;
15
COLUMNS
price
price
10
ALTER
products
INT
TABLE
COLUMN
category_id
DECIMAL
2
ADD
products
TABLE
CHANGE
ALTER
Problem
#15
:
Table
Schema
Modification
Scenario
:
You
already
have
a
products
table
and
need
to
add
a
new
column
category_id
and
change
the
data
type
of
the
existing
price
column
.
Solution
:
(
)
;
(
,
)
;
16
FROM
sales_amount
BY
category_id
INSERT
TABLE
OVERWRITE
sales
GROUP
SELECT
sales_summary
category_id
AVG
Problem
#16
:
Hive
SQL
Query
Scenario
:
Calculate
and
update
the
average
sales
for
each
product
category
in
a
sales_summary
table
.
Solution
:
,
(
)
;
17
LOAD
DATA
'/path/to/transactions.csv'
transactions
INPATH
TABLE
INTO
Problem
#17
:
Loading
Data
into
Hive
Table
Scenario
:
Load
data
into
a
transactions
table
from
a
CSV
file
located
in
HDFS
.
Solution
:
;
18
departments
s.department_id = d.department_id
sales
AS
GROUP
d.department_name
total_sales
s.amount
SELECT
ON
FROM
BY
d.department_name
JOIN
SUM
Problem
#18
:
Filtering
,
Aggregation
,
and
Join
Scenario
:
Retrieve
the
total
sales
by
department
from
a
sales
table
and
a
departments
table
.
Solution
:
,
(
)
s
d
;
19
transaction_date
AS
FROM
BY
daily_total
TEMPORARY
transaction_date
AS
amount
SELECT
GROUP
TABLE
sales
CREATE
SUM
temp_daily_sales
Problem
#19
:
Temporary
Tables
Scenario
:
Create
a
temporary
table
to
hold
daily
sales
data
for
analysis
within
a
session
.
Solution
:
,
(
)
;
20
customers
AS
customer_name
CREATE
FROM
customer_demographics
VIEW
region
age
SELECT
Problem
#20
:
Creating
and
Using
Views
Scenario
:
Create
a
view
to
simplify
access
to
customer
demographics
data
without
exposing
sensitive
details
like
personal
IDs
or
payment
methods
.
Solution
:
,
,
;
21
CREATE
url'
TABLE
INT
name
STRING
'hdfs
id
AVRO
TBLPROPERTIES
'avro
AS
path/to/schema/file'
schema
STORED
Problem
#21
:
Configuring
Schema
Evolution
for
Avro
1
.
Avro
Avro
format
supports
schema
evolution
out
of
the
box
with
Hive
.
When
using
Avro
,
the
schema
is
stored
with
the
data
,
which
helps
Hive
manage
changes
seamlessly
.
However
,
to
explicitly
enable
and
manage
Avro
schema
evolution
,
you
can
use
table
properties
like
the
following
:
avro_table
(
,
)
(
.
.
=
:
/
/
)
;
22
ORC
split
schema
INT
schema
evolution
strategy
evolution
hive
id
STRING
'true'
AS
SET
orc
exec
'orc
case
CREATE
allowed'
exec
first_name
sensitive'
'orc
TABLE
STORED
orc
SET
'false'
ETL
renames
hive
true
TBLPROPERTIES
column
Problem
#22
:
Configuring
Schema
Evolution
for
ORC
ORC
supports
schema
evolution
through
its
columnar
format
and
metadata
storage
capabilities
.
To
manage
schema
changes
,
you
might
need
to
adjust
the
following
Hive
configuration
settings
:
.
.
.
.
=
;
.
.
.
.
=
;
hive
.
exec
.
orc
.
split
.
strategy
:
Setting
this
to
ETL
optimizes
reading
of
ORC
files
that
might
have
evolved
schemas
.
hive
.
exec
.
orc
.
schema
.
evolution
:
Enabling
this
allows
Hive
to
handle
changes
in
the
ORC
file
schemas
over
time
.
Additionally
,
when
creating
ORC
tables
,
consider
enabling
column
renaming
as
part
of
schema
evolution
:
orc_table
(
,
)
(
.
.
.
.
=
,
.
.
.
=
)
;
23
true
PARQUET
name
dictionary
STORED
CREATE
SET
parquet
enable
STRING
TABLE
INT
id
AS
Problem
#23
:
Configuring
Schema
Evolution
for
PARQUET
Parquet
also
supports
schema
evolution
to
a
degree
,
especially
with
additions
of
new
columns
.
To
use
Parquet
effectively
with
schema
evolution
in
Hive
,
ensure
that
your
Hive
version
and
settings
align
with
Parquet
?
s
capabilities
:
parquet_table
(
,
)
;
For
schema
evolution
in
Parquet
,
the
changes
are
mostly
handled
transparently
by
Hive
,
but
you
can
ensure
better
management
with
configurations
like
:
.
.
=
;
|