What causes MISSING_GROUP_BY?

Adding an aggregate like `SUM()` or `COUNT()` to a SELECT that also returns raw dimension columns without a GROUP BY. Using a window function incorrectly where a GROUP BY aggregate was intended. Migrating SQL from a permissive dialect (MySQL with ONLY_FULL_GROUP_BY disabled) that allowed partial grouping. An ORM or BI tool generating SQL with an incomplete GROUP BY when aggregation columns are added at runtime

How do I fix MISSING_GROUP_BY?

Add a GROUP BY clause listing every non-aggregated column in the SELECT list.. If all rows should be aggregated into one row, remove all non-aggregated columns from the SELECT list.. Use a window function with OVER() if you need per-row aggregates without collapsing rows.. Validate generated SQL from BI tools against ANSI SQL group-by rules before deployment.

Low severityqueryDatabricks →

Databricks Error:
MISSING_GROUP_BY

What does this error mean?

A SELECT statement mixes aggregate functions with non-aggregated columns but does not include a GROUP BY clause listing the non-aggregated columns.

Common causes

1Adding an aggregate like `SUM()` or `COUNT()` to a SELECT that also returns raw dimension columns without a GROUP BY
2Using a window function incorrectly where a GROUP BY aggregate was intended
3Migrating SQL from a permissive dialect (MySQL with ONLY_FULL_GROUP_BY disabled) that allowed partial grouping
4An ORM or BI tool generating SQL with an incomplete GROUP BY when aggregation columns are added at runtime

How to fix it

1Add a GROUP BY clause listing every non-aggregated column in the SELECT list.
2If all rows should be aggregated into one row, remove all non-aggregated columns from the SELECT list.
3Use a window function with OVER() if you need per-row aggregates without collapsing rows.
4Validate generated SQL from BI tools against ANSI SQL group-by rules before deployment.

Frequently asked questions

Can I use GROUP BY column position instead of name?

Yes — Databricks supports `GROUP BY 1, 2` (positional) as a convenience, but explicit column names are preferred for readability and to avoid bugs when the SELECT list changes.

Source · docs.databricks.com/aws/en/error-messages/error-classes.html