• Genuinely Artificial
  • Posts
  • Protein Engineering over Prompt Engineering and Apple wants to tackle Spanglish

Protein Engineering over Prompt Engineering and Apple wants to tackle Spanglish

PLUS A PDF parsing demo and Halloween themed Angry-Birds knock-off

Read time: 3.1 minutes

Welcome to Genuinely Artificial!

This newsletter will augment your reality with AI insights, automation, and antics.

I’ll be operating as your own personal researcher, data scientist, data analyst, and tech journalist all rolled up into one. 🤓 

With your help, we can reach 100% worldwide subscribership within 34 newsletters if we double each time. Seems doable.

It’s the 3rd newsletter release and I’m happy to say we’ve officially doubled our subscriber base reaching one step closer to worldwide subscribership. 🤝

2 × 2 = 4 subscribers achieved (currently pacing ahead of this 😁)

4 subscribers × 232 to go. 🤷‍♂️ 2 34 ~ 8.6 billion population 🌎️

Apple is trying to tackle code-switching by throwing more languages at its models and a Halloween-themed Angry Birds knockoff gets 1 million views in less than a few days.

Let’s get into it with…

TODAY’S DOWNLOAD

  • Next Generation of AlphaFold

  • NLP tackles Spanglish detection

  • Research dataset storage

  • Angry Pumpkins

  • Parsing PDFs tutorial

AI NEWS & INSIGHTS

Research Repository

DeepMind’s Next Generation of AlphaFold

Prompt engineering is on the rage but what about protein engineering?

Google’s DeepMind and Isomorphic Labs continue to advance their AI model that promises to speed up biomedical discoveries.

The model offers a new understanding of:

  • disease pathways

  • drug design mechanisms

  • protein engineering

  • and more

The AlphaFold Protein Structure Database is publicly available for exploration.

NLP finds Spanglish hard, Apple aims to help

Spanglish is more commonly referred to as code-switching (CS) in Natural Language Processing. Code-switching is when we mix different languages in a single sentence.

Dime más, please!

Previous studies in CS speech showed some promising results for end-to-end speech translation but were lacking in two main ways.

  1. limited to offline scenarios

  2. limited to translation to one of the languages present in the source transcription

This new approach focuses on streaming settings and translation to a third language.

AUTOMATION AND ANTICS

AUTOMATION ALLEY

Sharing is caring on the Hugging Face Hub

If you’re a researcher with one of 360,000 models that are hosted on Hugging Face Hub it might be beneficial to store your datasets there as well.

10 Benefits to Researchers sharing their data on the Hub:

  1. visibility for your work

  2. tools for exploring and working with datasets

  3. tools for loading datasets hosted on the Hugging Face Hub

  4. datasets viewer (similar to Kaggle)

  5. community created tools

  6. spotlight - interactively explore your data with one line of code

  7. support for large datasets

  8. API and client library interaction

  9. gated repositories (access restrictions)

  10. digital object identifiers (register persistent identifier for your dataset)

More info on how to share your datasets with ease on the Hub

ANTICS AVENUE

There’s a safer way to smash pumpkins this year. 

The below X poster built an Angry Birds knock-off with some of the popular AI tools of the day. Code is provided if you want to replicate or build your own.

Who’s going to build the Thanksgiving Day version? 🦃 

TOOLS, TUTORIALS, & TESTIMONIALS

TOOL & TUTORIAL SPOTLIGHT

Parsing PDFs like a pro

This tutorial goes into depth on how to solve for extracting text from text-only layered PDFs. It’s a great place to start before jumping to more complexly structured PDFs.

LayoutPDFReader demo code is provided on GitHub covering:

  1. Vector search

  2. retrieval augmented generation (RAG) with smart chunking

EPILOGUE

FROM THE READERS

We received our first five-star review and we’re officially star-struck! 🤩 

FEEDBACK WELCOME

How can I make the Genuinely Artificial newsletter more entertaining, educational, and useful for YOU?

I’ll attempt to read your minds but most of my prediction capabilities are still centered around churn models. 🫣

How did you like today's newsletter?

Your feedback helps increase the value we bring to future posts.

Login or Subscribe to participate in polls.

Comment below or hit me up on X (formerly Twitter).

I look forward to seeing you in the next newsletter!

Chris

Reply

or to participate.