How Can I Avoid Duplication When Joining the AD and AD_SET Tables?
Question
How can I join the AD and AD_SET tables without creating duplicate rows?
Environment
Connector: Spotify Ads
Answer
We treat the updated_at column as a primary key. As a result, when you join the AD and AD_SET tables on id, you may encounter duplicate rows because multiple versions of the same ad set exist.
To avoid duplicate rows, use a PARTITION BY clause to return only the most recent row for each ad set:
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) AS rn
FROM FIVETRAN_DB.spotify_ads.ad_set
)
select ad_set.updated_at as adset_updated, ad.updated_at as ad_update ,ad_set.delivery ,ad.delivery, ad.*
from FIVETRAN_DB.spotify_ads.ad ad join cte ad_set
on ad.ad_set_id = ad_set.id
where ad_set.id = '{id}' and rn = 1