reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用选择表达式的结果;条款?)
问题描述
在 MySQL 中,我可以有这样的查询:
In MySQL, I can have a query like this:
select
cast(from_unixtime(t.time, '%Y-%m-%d %H:00') as datetime) as timeHour
, ...
from
some_table t
group by
timeHour, ...
order by
timeHour, ...
其中 GROUP BY 中的 timeHour 是选择表达式的结果.
where timeHour in the GROUP BY is the result of a select expression.
但是我刚刚尝试了一个类似于 Sqark SQL 中的查询,我得到了一个错误
But I just tried a query similar to that in Sqark SQL, and I got an error of
Error: org.apache.spark.sql.AnalysisException:
cannot resolve '`timeHour`' given input columns: ...
我对 Spark SQL 的查询是这样的:
My query for Spark SQL was this:
select
cast(t.unixTime as timestamp) as timeHour
, ...
from
another_table as t
group by
timeHour, ...
order by
timeHour, ...
这个结构在 Spark SQL 中可行吗?
Is this construct possible in Spark SQL?
推荐答案
这个结构在 Spark SQL 中可行吗?
Is this construct possible in Spark SQL?
是的,是.您可以通过两种方式使其在 Spark SQL 中工作,以在 GROUP BY 和 ORDER BY 子句中使用新列
Yes, It is. You can make it works in Spark SQL in 2 ways to use new column in GROUP BY and ORDER BY clauses
使用子查询的方法一:
SELECT timeHour, someThing FROM (SELECT
from_unixtime((starttime/1000)) AS timeHour
, sum(...) AS someThing
, starttime
FROM
some_table)
WHERE
starttime >= 1000*unix_timestamp('2017-09-16 00:00:00')
AND starttime <= 1000*unix_timestamp('2017-09-16 04:00:00')
GROUP BY
timeHour
ORDER BY
timeHour
LIMIT 10;
方法 2 使用 WITH//优雅的方式:
-- create alias
WITH table_aliase AS(SELECT
from_unixtime((starttime/1000)) AS timeHour
, sum(...) AS someThing
, starttime
FROM
some_table)
-- use the same alias as table
SELECT timeHour, someThing FROM table_aliase
WHERE
starttime >= 1000*unix_timestamp('2017-09-16 00:00:00')
AND starttime <= 1000*unix_timestamp('2017-09-16 04:00:00')
GROUP BY
timeHour
ORDER BY
timeHour
LIMIT 10;
在 Scala 中使用 Spark DataFrame(wo SQL) API 的替代方法:
// This code may need additional import to work well
val df = .... //load the actual table as df
import org.apache.spark.sql.functions._
df.withColumn("timeHour", from_unixtime($"starttime"/1000))
.groupBy($"timeHour")
.agg(sum("...").as("someThing"))
.orderBy($"timeHour")
.show()
//another way - as per eliasah comment
df.groupBy(from_unixtime($"starttime"/1000).as("timeHour"))
.agg(sum("...").as("someThing"))
.orderBy($"timeHour")
.show()
这篇关于在“GROUP BY"中重用选择表达式的结果;条款?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:在“GROUP BY"中重用选择表达式的结果;条款
基础教程推荐
- CHECKSUM 和 CHECKSUM_AGG:算法是什么? 2021-01-01
- 使用 VBS 和注册表来确定安装了哪个版本和 32 位 2021-01-01
- 带更新的 sqlite CTE 2022-01-01
- MySQL根据从其他列分组的值,对两列之间的值进行求和 2022-01-01
- ORA-01830:日期格式图片在转换整个输入字符串之前结束/选择日期查询的总和 2021-01-01
- while 在触发器内循环以遍历 sql 中表的所有列 2022-01-01
- MySQL 5.7参照时间戳生成日期列 2022-01-01
- 从字符串 TSQL 中获取数字 2021-01-01
- 带有WHERE子句的LAG()函数 2022-01-01
- 如何在 CakePHP 3 中实现 INSERT ON DUPLICATE KEY UPDATE aka upsert? 2021-01-01
