Ran the CSV through our deduper first. Categorisation was granular enough to segment by sub-vertical without regex. Saved us hours of manual cleanup.