πŸ”

Data Acquisition Specialist

Station F, Paris
Full Time
Open
Apply

About the job offer

Neuralk-AI is looking for a skilled and motivated Data Acquisition Specialist to lead our data acquisition strategy and help structure our datasets for building cutting-edge AI models. You will collaborate closely with our research and engineering teams to fuel our AI platform with high-quality, diverse, and valuable data.

‍

About Neuralk

We are a passionate team leading the way in AI innovation, committed to driving the rapid adoption of transformative AI applications. Our focus is on developing the technical tools to allow any company to build AI applications that natively interact with their structured databases (tabular or graph databases). Specifically, we develop a modern AI embedding platform to convert any structured database to a vectorstore that can later be combined with classic Machine Learning models for classification, regression or clustering purposes.

As an early-stage AI-driven startup backed by significant funding (several millions), we base our approach on state-of-the-art academic research to drive practical business solutions. We value clear communication and simplicity in our approaches, promoting a constant optimization mindset.

Join Neuralk to be part of a growing team, eager to learn and adapt, united by the belief that our technology can make a significant positive impact and contribute to transforming the AI industry.

Co-founders: Alexandre Pasquiou (CSO) & Antoine Moissenot (CEO).

Neuralk is dedicated to equal opportunity employment and fosters an environment that is open and respectful of diversity. All applicants are encouraged to apply, even if you don’t meet all requirements. If you have passion for our mission, learn quickly and believe you can contribute, we want to hear from you.

‍

Reporting & Job Location

You will report to the CSO of Neuralk and will be located in our Paris offices.

Mission Highlights

As a Data Acquisition Specialist, your role will focus on shaping and executing the strategy to source, acquire, and prepare datasets critical to training our AI models. Your contributions will directly impact the quality and performance of our platform, ensuring data is ready for advanced AI workflows.

‍

Role & responsibilities

  • Lead data acquisition strategy:
    • Collaborate with ML Scientists to understand training data needs and define the acquisition plan.
    • Implement synthetic data generation using statistical methods or learned patterns from real-world data (Python).
    • Source and acquire public datasets, conduct web scraping, or manage freelancers for scraping projects.
    • Support sales teams in client meetings by explaining our technology and securing data access for model training and evaluation.
  • Ensure data quality:
    • Build data validation tools: python scripts to predict the quality of a dataset from a sample, as well as anonymisation tools.
    • Define and track metrics to assess dataset relevance and value.
    • Organize and maintain datasets to ensure accuracy, reliability, and readiness for AI model use.
  • Dataset consolidation: Design robust pipelines for data storage and access.
  • Collaborative work: Partner with researchers and engineers to identify the most valuable data sources and optimize our data acquisition processes.
  • Exploration: Experiment with innovative approaches to enhance data diversity, coverage, and quality, driving better AI model performance.

‍

‍

Profile

We are looking for a self-starter with a passion for data and its role in AI development. The ideal candidate will be proactive, detail-oriented, and comfortable working in a fast-paced startup environment.

‍

Requirements

  • Proven experience in data acquisition, preparation, or management roles, ideally in AI or tech-driven companies.
  • Strong understanding of database systems (SQL, NoSQL) and data storage frameworks (e.g., Parquet).
  • Expertise in data cleaning, validation, and standardization for large datasets.
  • Solid coding skills in Python, with familiarity with data processing libraries (e.g., Pandas, NumPy, skrub, sklearn).
  • Experience working with cloud platforms and containerization technologies (e.g., AWS, Docker, GCP).
  • Excellent problem-solving skills and an autonomous, proactive work style.

Bonuses

  • Experience in sourcing data for machine learning applications, particularly embedding models.
  • Familiarity with integrating APIs for data acquisition.
  • Knowledge of GDPR and data compliance requirements.
  • Background in working with structured and unstructured data, such as images, text, or tabular data.
Interested in the role?

Get in touch and we will geet back to you shortly.

Recruitment process

Compensation & benefits

We are a fast-pace startup, yet, we favor a good work-life balance and interesting compensations. We offer:

  • A competitive salary
  • Equity (BSPCE), to reflect the value you bring to Neuralk and to foster a shared journey
  • Comprehensive health insurance
  • French level paid leave and time-off work
  • Dynamic work setting. Although our preference is for in-person collaboration, we will be flexible with occasional remote work arrangements.
  • and more to come as we grow