For sessions just containing screenshots, we currently try reading the timestamps from filepath, after generating the captions. I'd prefer having a way to generate aggregations.jsonl file just from screenshots (with screenshot_path and optionally timestamp) and then use this to naturally create the final data result (containing caption, screenshot and timestamp); Currently this is just contained after generating a HF datasets.