📖

Data Science Intern

Station F, Paris
Full Time
Open
Apply

About the Internship

Are you a passionate and curious engineering student looking for an opportunity to kickstart your career in AI? At Neuralk-AI, we're seeking a motivated intern to help us aggregate and structure training datasets by developing innovative web scraping pipelines. This is your chance to dive into the world of cutting-edge AI research while gaining hands-on experience in a dynamic startup environment!

About Neuralk

We are a fast-growing deeptech startup, leading the way in AI innovation. Our mission is to build the technical tools that enable companies to create AI applications capable of interacting seamlessly with their structured data (tabular or graph databases). At the heart of our work is a modern AI embedding platform that transforms structured data into vector representations for applications in classification, regression, clustering, and more.

Backed by significant funding (several millions), we combine state-of-the-art academic research with practical business applications to drive real impact. Our culture values simplicity, clear communication, and a constant drive for optimization.

At Neuralk, you’ll join a team of passionate individuals eager to learn, grow, and transform the AI industry. We believe in fostering a diverse, respectful, and inclusive environment and welcome candidates from all backgrounds to apply.

Co-founders: Alexandre Pasquiou (CSO) & Antoine Moissenot (CEO).

Reporting & Job Location

You will report to the CSO of Neuralk and will be located in our Paris offices.

Mission Highlights

As a Data Science Intern, you’ll play a key role in building high-quality training datasets that fuel our AI models. By developing web scraping pipelines and consolidating diverse data sources, you’ll help lay the foundation for groundbreaking advancements in AI for structured data.

Role & responsibilities

  • Web scraping: Design, implement, and maintain efficient web scraping pipelines to collect high-quality data from diverse online sources.
  • Data cleaning and preprocessing: Ensure the scraped data is accurate, structured, and ready for use in training AI models.
  • Dataset consolidation: Aggregate data from multiple sources, standardizing formats and ensuring compatibility with our AI platform.
  • Collaborative work: Partner with our research and engineering teams (~5 people) to identify the most valuable data sources and contribute to our dataset strategy.
  • Exploration: Experiment with innovative approaches to improve data quality and diversity, fueling better model performance.
  • Currently pursuing a degree in Computer Science, Engineering, Data Science, or a related field (Bac+3/Bac+5 or equivalent).
  • Programming skills: Proficiency in Python; experience with web scraping libraries like BeautifulSoup, Scrapy, or Selenium is a big plus.
  • Data processing: Familiarity with data cleaning and preprocessing tools (e.g., Pandas, NumPy, Skrub).
  • Strong interest in AI and machine learning; curiosity about how structured data can be transformed into actionable insights (Sklearn).
  • Self-starter with the ability to work autonomously and solve problems creatively.
  • Good communication skills in English.
  • Experience with large-scale data collection or analysis projects.
  • Interest or experience in deep learning frameworks (e.g., PyTorch, TensorFlow).
  • Familiarity with version control systems like Git.
Interested in the role?

Get in touch and we will geet back to you shortly.

Recruitment process

Why you should join us ?

  • Hands-on learning: Get practical experience in an exciting and rapidly evolving field.
  • Mentorship: Work closely with experienced researchers and engineers who are eager to share their knowledge.
  • Impactful work: Your contributions will directly support the development of cutting-edge AI models and platforms.
  • Dynamic environment: Be part of a fast-growing startup where your ideas and efforts will make a tangible difference.
  • Growth opportunities: Gain exposure to advanced AI concepts and methodologies, positioning yourself for a future career in machine learning.