Lightning Talks + Closing Remarks DAY 1 | PyData Amsterdam 2024

Key insights from PyData Amsterdam 2024 Day 1 lightning talks: healthcare data bias, timezones, LLMs, NLP trends, clinical trials, data structures & evolving data roles.

Key takeaways
  • Missing or biased data in healthcare and medical research can have fatal consequences, especially for women and underrepresented groups since most data is collected from male subjects

  • Time zone handling in data pipelines remains challenging - inconsistencies between UTC, local time zones, and daylight savings time can cause data duplication and analysis issues

  • Converting documents (like PDFs) to markdown using multimodal LLMs is becoming a viable alternative to traditional OCR approaches, with better accuracy and structure preservation

  • In NLP, the rapid evolution of models and techniques (BERT, GPT, LoRA, etc.) creates a steep learning curve for newcomers - understanding core concepts is more important than chasing latest trends

  • Clinical trial data management requires special consideration for security, standardization and proper handling of sensitive patient information

  • When working with data frames and time series, understanding the underlying data structure and intent is more important than simply choosing pandas by default

  • Data roles continue to evolve and specialize - from data analysts and scientists to ML engineers, research scientists, and data engineers, each with distinct skillsets

  • Test data generation and synthetic data creation remain challenging, especially for complex scenarios and edge cases

  • Code readability and maintainability should prioritize clear intent and documentation over minimal line count

  • Proper data visualization and communication are critical for conveying insights and findings effectively across teams