Our Azure Blob Storage connector can now sync Parquet files. We support Parquet format 2.4.0. This feature is in Beta.
We have improved the way we track which files we have already synced to make sure we only pull new or changed data from the source containers. Previously, we re-synced files that were created at the same time as the last observed cursor position. That ensured that we never missed any files that were created while we were syncing your data. That also meant that we sometimes synced the same files twice. Now, in addition to tracking the timestamp, we also track the names of the files we have already synced. We store up to 1,000 file names. We sync files created at the time of the last observed cursor position only if we don’t have the file in our list of synced files for that timestamp.