This is a remote position
What we do :
We are at the forefront of leveraging NLP technologies to revolutionize data analysis and decision-making
processes across various domains, including pharmaceuticals and research. Our dynamic team is currently
transitioning to advanced platforms such as Sinequa on Amazon SageMaker, enhancing our capabilities in handling
large-scale data with sophisticated language models.
Job Description :
We seek an experienced Senior NLP Engineer / Large Language Model Architect to lead the development and
optimization of our NLP projects. The ideal candidate will be instrumental in refining our large language models,
particularly focusing on areas such as pharmaceutical SOPs and research contexts. This role requires a deep
understanding of NLP principles, practical experience in deploying large-scale models, and a problem-solving
mindset geared towards delivering high-quality, efficient solutions.
Key Responsibilities :
Design, develop, and refine large language models to meet specific project requirements, ensuring high
quality and efficiency.
Lead the transition and optimization of NLP models from platforms like Solr and Elasticsearch to Sinequa
and Amazon SageMaker.
Develop and implement strategies for prompt engineering, model refinement, and training pipelines to
enhance model performance.
Collaborate with internal teams and stakeholders to understand project needs, providing expert guidance
and NLP architectural advice.
Manage the integration of knowledge graphs into NLP models to improve contextual understanding and
output relevance.
Evaluate and utilize state-of-the-art embedding vectors and encoding methods to ensure optimal model
performance.
Oversee the concept extraction and entity recognition processes, ensuring integration with existing
knowledge graphs and data pipelines.
Guide the team in the expansion and refinement of taxonomies using large language models, followed by
human review for tagging accuracy.
Drive the adoption of best practices in NLP model development, deployment, and maintenance, staying
abreast of the latest industry trends and research.
Required Skills and Experience :
Advanced degree in Computer Science, Artificial Intelligence, Linguistics, or a related field.
Extensive experience in NLP, with a strong portfolio of projects involving large language models.
Proficiency in using platforms such as Hugging Face, Sinequa, and Amazon SageMaker for model
deployment and refinement.
Solid understanding of model training, tuning, and refinement techniques, with proven success in applying
these to large-scale projects.
Experience with knowledge graphs, entity recognition, and concept extraction technologies.
Strong coding skills in relevant programming languages (e.g., Python) and familiarity with NLP libraries and
frameworks.
Demonstrated ability to work effectively in a fast-paced, dynamic environment, managing multiple
projects simultaneously.
Excellent problem-solving skills, with a keen eye for detail and a commitment to high-quality outcomes.
Strong communication and collaboration skills, with the ability to convey complex technical concepts to
non-technical stakeholders.
Desirable Attributes :
Experience in the pharmaceutical or research sectors, with an understanding of SOPs and compliance
requirements.
A pragmatic approach to project management, avoiding magical solutions and focusing on tangible
results.
The flexibility to adapt to changing project needs and the ability to guide teams through technical
challenges.
A team player attitude, with experience working in or leading multidisciplinary teams.
About the project & the Team :
We have a RAG platform. Currently, we are transitioning from Solr and Elasticsearch on Cloudera to Sinequa and
SageMaker on Amazon, not by choice, but it is what it is. We require NLP experts to handle aspects related to
large language models. Deploying the models is relatively straightforward, typically done through Hugging Face,
but that’s not the main issue. The challenge lies in the nuances of developing high-quality large language models,
including organizing prompts, refining training pipelines, and understanding when they work or fail. For instance,
over the last three weeks, we attempted refinement training on pharmaceutical standard operating procedures
SOPs), which was unsuccessful. Initially, the model forgot its chatbot functionalities. After two weeks of retraining
on a local cluster, the refinements on SOPs were lost again, indicating a fundamental flaw in our approach. Our
NLP team suggests experimental methods based on academic papers, but we lack the luxury of time for such trials.
Therefore, we have the need for experienced individuals who can provide architect-level guidance on managing
multiple large language models, including those specific to SOPs and research contexts. We are developing own
large language model using our data, which we plan to utilize. We need to strategize on prompt engineering and
ETL processes, and how to integrate our knowledge graph into the prompts. Currently, we are using the top
embedding vector from Hugging Face for encoding, which is feasible as our beta environment only hosts 20 million
records, allowing for a manageable re-encoding timeline.
We aim for high-quality outcomes with our large language models, applicable universally, not just our data. We
are seeking someone well-versed in the theoretical and practical aspects of NLP, capable of guiding us in
refinement training, estimating costs, and expected quality improvements. The goal is to have mature, pragmatic
discussions rather than rely on uncertain, magical solutions.
We are in a situation, recognizing that the field attracts many who prefer experimenting with technology over
delivering tangible value. We have learned that finding individuals with a comprehensive skill set is unrealistic due to
the multidisciplinary nature of this field. We have shifted our strategy to engage specialists for specific phases,
thereby avoiding the inefficiency of prolonged meetings.
Our requirement is straightforward : an expert who can effectively deploy large language models of high quality.
We are exploring various tools and methods, such as Data IKU for concept extraction and SciByte for named entity
recognition (NER), supported by our knowledge graph. Our challenges include managing taxonomies, expanding
synonyms with large language models, and ensuring human review for tagging accuracy during ETL. We seek a
senior professional who can guide our team, which, despite some cynicism, is capable when given clear
instructions.
Our team, consisting of around 40-50 members, including 20 developers and 10 individuals focused on NLP,
recently welcomed a new master;s graduate, bringing fresh expertise to our algorithmic challenges. However, we
lack comprehensive team capabilities. Our preference is for a homogenous team, given the challenges of
managing multinational teams across different time zones.
Our immediate need is not for Amazon or SageMaker experts but for someone to address the core aspects of large
language model application. We consider a hybrid model, identifying an MVP& person for immediate needs,
supported by a flexible structure that accommodates project-specific specialists. This approach is necessitated by
our complex infrastructure, which requires a significant onboarding period to fully grasp.
In summary, we seek a candidate capable of integrating into our system, understanding our sophisticated
processes, and contributing effectively. This involves not just technical skills but also adapting to our
organizational culture and procedural requirements. The ideal candidate would be evaluated over several months,
allowing for a comprehensive assessment of their fit and contribution to our objectives.
Gamito is a licensed recruitment agency under number 1820 / 16.12.2014 with free-of-charge services to the candidates.
High social package, home office, flexible work time.
Guidance and onboarding will be provided.
Regular Bulgarian working hours with flexible time and excellent social package
Center 2 (19050), United States of America, McLean, VirginiaSenior Data Scientist – NLP At Capital One, we think big and...
Apply For This JobWe are seeking a Staff Machine Learning Engineer to join our platform architecture team. This role is an opportunity to...
Apply For This JobAufgabenbeschreibung In dem Bereich von ICF FF NLP werden klare Insulinzubereitungen hergestellt, in Isolatortechnologie in Zylinderampullen abgefüllt und automatisch inline...
Apply For This JobAre you excited about working at the forefront of applied research in an industry setting? Thomson Reuters Labs in Toronto...
Apply For This JobJob Description: Nous sommes à la recherche d’un développeur Node/Vue qui possède une forte appétence pour le développement front-end pour...
Apply For This JobThis inclusive employer is a member of myGwork – the largest global platform for the LGBTQ+ business community. Job Description...
Apply For This Job