web123456

【Hive】Date from plastic surgery to Date type

When creating a table, we often set the date field to INT type, such as20180601Such a numeric value represents the date, so there is no problem when doing date comparison operations, but to perform certain date calculations, you have to convert it to the date type before you can calculate it. How to convert it?

Data preparation

Next, create a table in Hive, which contains an INT-type date field and insert two rows of data.

create table tb (dt INT);
insert into tb values (20180701);
insert into tb values (20180715);
  • 1
  • 2
  • 3

Convert type

The first method here is toINTConvert date value of type toSTRINGType, built-in with Hiveunix_timestampfunctionConvert to timestamp type, and finally use the timestampfrom_unixtimeConvert toyyyy-MM-dddate type.

The second type is more direct,INTConvert date value of type toSTRINGType, then intercept the string, use-Spliced ​​together.

select dt,
from_unixtime(unix_timestamp(cast(dt as string),'yyyyMMdd'),'yyyy-MM-dd') as a,
concat(substr(cast(dt as string),1,4), '-',substr(cast(dt as string),5,2), '-',substr(cast(dt as string),7,2)) as b
from tb;
  • 1
  • 2
  • 3
  • 4

Running results

dt a b
20180701 2018-07-01 2018-07-01
20180715 2018-07-15 2018-07-15

Of course, it's a bit difficult to write like this every time. You can create UDFs or macros in Hive and just call them during conversion.

Create macro command

Macro commands are simpler and more convenient than UDF, butMacros can only be temporary macros, and are only visible and valid in this session. So you need to put the macro script at the head of the SQL script

DROP TEMPORARY MACRO IF EXISTS date_trans;
CREATE TEMPORARY MACRO date_trans(dt int)
    if(dt is not null and length(dt)=8,
       concat(substr(cast(dt as string),1,4), '-',substr(cast(dt as string),5,2), '-',substr(cast(dt as string),7,2)), 
       null);

--Call
select dt,date_trans(dt) as a from tb;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

If a function or macro command of the same function is called multiple times, it will be very convenient to maintain and the statements are much simpler.

More

A list of built-in date functions in Hive

Return Type Name(Signature) Description illustrate
string from_unixtime(bigint unixtime[, string format]) Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of “1970-01-01 00:00:00”. Convert the second value of time into format format (format can be "yyyy-MM-dd hh:mm:ss", "yyyy-MM-dd hh", "yyyy-MM-dd hh:mm", etc.) such as from_unixtime(1250111000, "yyyy-MM-dd") to obtain 2009-03-12
bigint unix_timestamp() Gets current Unix timestamp in seconds. Get the time stamp in the local time zone
bigint unix_timestamp(string date) Converts time string in format?yyyy-MM-dd HH:mm:ss?to Unix timestamp (in seconds), using the default timezone and the default locale, return 0 if fail: unix_timestamp(‘2009-03-20 11:30:01’) = 1237573801 Convert the time string with the format yyyy-MM-dd HH:mm:ss into a timestamp? For example, unix_timestamp('2009-03-20 11:30:01') = 1237573801
bigint unix_timestamp(string date, string pattern) Convert time string with given pattern (see [/javase/tutorial/i18n/format/]) to Unix time stamp (in seconds), return 0 if fail: unix_timestamp(‘2009-03-20’, ‘yyyy-MM-dd’) = 1237532400. Convert the specified time string format string to Unix timestamp, if the format is incorrect, return 0, for example: unix_timestamp('2009-03-20', 'yyyy-MM-dd') = 1237532400
string to_date(string timestamp) Returns the date part of a timestamp string: to_date(“1970-01-01 00:00:00”) = “1970-01-01”. Returns the date part of the time string
int year(string date) Returns the year part of a date or a timestamp string: year(“1970-01-01 00:00:00”) = 1970, year(“1970-01-01”) = 1970. Returns the year part of the time string
int quarter(date/timestamp/string) Returns the quarter of the year for a date, timestamp, or string in the range 1 to 4 (as of Hive?1.3.0). Example: quarter(‘2015-04-08’) = 2. Return to the current time attribute which quarter is the quarter (‘2015-04-08’) = ​​2
int month(string date) Returns the month part of a date or a timestamp string: month(“1970-11-01 00:00:00”) = 11, month(“1970-11-01”) = 11. Returns the month part of the time string
int day(string date) dayofmonth(date) Returns the day part of a date or a timestamp string: day(“1970-11-01 00:00:00”) = 1, day(“1970-11-01”) = 1. Returns the day of the time string
int hour(string date) Returns the hour of the timestamp: hour(‘2009-07-30 12:58:59’) = 12, hour(‘12:58:59’) = 12. Returns the hour of the time string
int minute(string date) Returns the minute of the timestamp. Returns the minute of the time string
int second(string date) Returns the second of the timestamp. Returns the seconds of the time string
int weekofyear(string date) Returns the week number of a timestamp string: weekofyear(“1970-11-01 00:00:00”) = 44, weekofyear(“1970-11-01”) = 44. Return the time string is within the first week of the year? For example weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44
int datediff(string enddate, string startdate) Returns the number of days from startdate to enddate: datediff(‘2009-03-01’, ‘2009-02-27’) = 2. Calculate the number of days between the start time startdate and enddate
string date_add(string startdate, int days) Adds a number of days to startdate: date_add(‘2008-12-31’, 1) = ‘2009-01-01’. Startdate plus days from start time
string date_sub(string startdate, int days) Subtracts a number of days to startdate: date_sub(‘2008-12-31’, 1) = ‘2008-12-30’. Subtract days from start time startdate
timestamp from_utc_timestamp(timestamp, string timezone) Assumes given timestamp is UTC and converts to given timezone (as of Hive?0.8.0). For example, from_utc_timestamp(‘1970-01-01 08:00:00’,’PST’) returns 1970-01-01 00:00:00. If the given timestamp is not UTC, it is converted to the specified time zone timetamp
timestamp to_utc_timestamp(timestamp, string timezone) Assumes given timestamp is in given timezone and converts to UTC (as of Hive?0.8.0). For example, to_utc_timestamp(‘1970-01-01 00:00:00’,’PST’) returns 1970-01-01 08:00:00. If the given timestamp is specified in the time zone, it is converted into a timetamp under UTC
date current_date Returns the current date at the start of query evaluation (as of Hive?1.2.0). All calls of current_date within the same query return the same value. Return to the current time and date
timestamp current_timestamp Returns the current timestamp at the start of query evaluation?(as of Hive?1.2.0). All calls of current_timestamp within the same query return the same value. Return the current time stamp
string add_months(string start_date, int num_months) Returns the date that is num_months after start_date?(as of Hive?1.1.0). start_date is a string, date or timestamp. num_months is an integer. The time part of start_date is ignored.?If start_date is the last day of the month or if the resulting month has fewer days than the day component of start_date, then the result is the last day of the resulting month. Otherwise, the result has the same day component as start_date. Return to the date of adding num_months months to the current time
string last_day(string date) Returns the last day of the month which the date belongs to?(as of Hive?1.1.0). date is a string in the format ‘yyyy-MM-dd HH:mm:ss’ or ‘yyyy-MM-dd’.?The time part of date is ignored. Return to the date of the last day of the month, ignoring the hour, minute and second part (HH:mm:ss)
string next_day(string start_date, string day_of_week) Returns the first date which is later than start_date and named as day_of_week?(as of Hive1.2.0).?start_date is a string/date/timestamp. day_of_week is 2 letters, 3 letters or full name of the day of the week (. Mo, tue, FRIDAY). The time part of start_date is ignored. Example: next_day(‘2015-01-14’, ‘TU’) = 2015-01-20. Return the date corresponding to the next week X of the current time. For example: next_day('2015-01-14', 'TU') = 2015-01-20? Take 2015-01-14 as the starting time, and the date corresponding to the next Tuesday is 2015-01-20
string trunc(string date, string format) Returns date truncated to the unit specified by the format?(as of Hive?1.2.0). Supported formats: MONTH/MON/MM, YEAR/YYYY/YY. Example: trunc(‘2015-03-17’, ‘MM’) = 2015-03-01. Return the first year or month of the time ?For example, trunc("2016-06-26", "MM")=2016-06-01 ?trunc("2016-06-26", "YY")=2016-01-01 ? Note that the supported formats are MONTH/MON/MM, YEAR/YYY/YY
double months_between(date1, date2) Returns number of months between dates date1 and date2 (as of Hive?1.2.0). If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise the UDF calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. date1 and date2 type can be date, timestamp or string in the format ‘yyyy-MM-dd’ or ‘yyyy-MM-dd HH:mm:ss’. The result is rounded to 8 decimal places. Example: months_between(‘1997-02-28 10:30:00’, ‘1996-10-30’) = 3.94959677 Returns the month where the difference between date1 and date2, such as date1>date2, it returns positive, if date1