Code Data Validation Consultant (Machine Learning & Data Processing)
Company: US Tech Solutions, Inc.
Location: San Jose
Posted on: April 26, 2025
|
|
Job Description:
Location: San Jose, CA (onsite in a hybrid model)
Find out if this opportunity is a good fit by reading all of the
information that follows below.
Job Description: Join our team to enable cutting-edge AI/ML
innovation by building robust data pipelines and automation
tools.
You'll work closely with human data operators and generative AI
teams to process, analyze, and optimize high-quality datasets for
training machine learning models.
Your work will directly impact the efficiency and performance of AI
systems, from automating data quality checks to designing
infrastructure that scales with evolving model requirements.
This role is ideal for a problem-solver who thrives in fast-paced
environments and enjoys bridging data engineering with machine
learning.
Responsibilities: Data Pipeline Development:
Design and implement Python-based automation tools to process,
clean, and transform raw data for ML training.
Build custom scripts to streamline data ingestion and preprocessing
workflows.
Quality Analysis & Reporting:
Conduct manual and automated quality assessments to identify
high/low-impact data for model training.
Generate reports detailing experimental results, data
effectiveness, and recommendations for improvement.
ML Model Integration:
Train and evaluate open-source ML models (e.g., Gemma) to assess
data impact on model performance.
Collaborate with AI teams to refine data selection strategies based
on model feedback.
Infrastructure Optimization:
Develop scalable solutions in Colab/Jupyter Notebooks to automate
data validation and filtering.
Troubleshoot and debug data formatting issues (e.g., code-comment
relevance, dataset consistency).
Required (Mandatory): Preferred: 2-3+ years in data
analysis/validation/engineering, ML engineering, or
automation-focused roles.
Bonus: PhD graduates with hands-on ML/data processing projects.
Required (Desired): Exposure to Generative AI models (e.g., GPT,
Llama) or large-scale datasets.
Bash/Shell Scripting: Ability to automate repetitive tasks.
Familiarity with APIs for data ingestion/processing.
Experience contributing to open-source projects or public GitHub
repositories.
Knowledge of cloud services.
Skills: Technical Expertise:
Python: Medium to Advanced proficiency (scripting, automation, data
processing libraries like Pandas/NumPy).
Hands-on experience writing, executing and reviewing code.
(Preferably using Colab/Jupyter Notebooks)
Data & ML Skills:
Experience training/fine-tuning ML models and analyzing their
performance.
Familiarity with public data platforms (Hugging Face, GitHub) and
data formats (JSON, CSV).
Analytical Skills.
Proven ability to assess data quality and build tools to automate
quality checks.
Why Join This Project: Impact AI innovation by shaping the data
backbone of advanced ML systems.
Collaborate with senior data engineers and generative AI
experts.
Flexible hybrid work environment with opportunities for growth.
Education: Bachelor's degree in Computer Science, Data Science,
Engineering, or related STEM field.
About US Tech Solutions:
US Tech Solutions is a global staff augmentation firm providing a
wide range of talent on-demand and total workforce solutions. To
know more about US Tech Solutions, please visit
www.ustechsolutions.com.
US Tech Solutions is an Equal Opportunity Employer. All qualified
applicants will receive consideration for employment without regard
to race, color, religion, sex, sexual orientation, gender identity,
national origin, disability, or status as a protected veteran.
Keywords: US Tech Solutions, Inc., Pleasanton , Code Data Validation Consultant (Machine Learning & Data Processing), Professions , San Jose, California
Click
here to apply!
|