MetricSign
Start free
Low severityqueryDatabricks

Databricks Error:
MISSING_GROUP_BY

What does this error mean?

A SELECT statement mixes aggregate functions with non-aggregated columns but does not include a GROUP BY clause listing the non-aggregated columns.

Common causes

  • 1Adding an aggregate like `SUM()` or `COUNT()` to a SELECT that also returns raw dimension columns without a GROUP BY
  • 2Using a window function incorrectly where a GROUP BY aggregate was intended
  • 3Migrating SQL from a permissive dialect (MySQL with ONLY_FULL_GROUP_BY disabled) that allowed partial grouping
  • 4An ORM or BI tool generating SQL with an incomplete GROUP BY when aggregation columns are added at runtime

How to fix it

  1. 1Add a GROUP BY clause listing every non-aggregated column in the SELECT list.
  2. 2If all rows should be aggregated into one row, remove all non-aggregated columns from the SELECT list.
  3. 3Use a window function with OVER() if you need per-row aggregates without collapsing rows.
  4. 4Validate generated SQL from BI tools against ANSI SQL group-by rules before deployment.

Frequently asked questions

Can I use GROUP BY column position instead of name?

Yes — Databricks supports `GROUP BY 1, 2` (positional) as a convenience, but explicit column names are preferred for readability and to avoid bugs when the SELECT list changes.

Source · docs.databricks.com/aws/en/error-messages/error-classes.html

Other query errors