ଜୟ ଜଗନ୍ନାଥ 🙏🏼
Welcome to the 4th Odisha ML monthly newsletter edition.
This has been a busy month with continuous breakthroughs in generative AI. So I tried to be concise and fit the most impactful updates.
Community News and discussions
Tools to extract structured data from 1000+ PDF files
tesseract-ocr: Tesseract Open Source OCR Engine
Texract - It uses everything. The installation is tricky but good to use.
Fitz: If your data is readable PDF and doesn't output garbage values when reading, use PyMuPDF (also called Fitz).
Tabula: Use Tabula for reading tables, but it gives a lot of unassigned columns
PyPDF: A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
LayoutLMv3 (hugging face): if you want to segregate pdf portions (table, images), etc. Can fine-tune custom datasets.
ABBYY: Try PDF software for free; paid tool.
The best way to read PDF tables is to convert them to docx files; it gives tables as DFs. If you have anything confidential and unreadable pdf, load all pages using Fitz, convert the pages to images, increase the image's pixel by 5 to 10 times, then use OCR. You will get the best of tesseract OCR.
Mozilla Common Voice is an initiative to help teach machines how real people speak. You can find 10 minutes to generate a sizeable public Speech <> Text corpus for the Odia language.
Dr. Shubhabrata Samantaray has received a doctorate degree for inventing a novel Sustainable AI Technology Solution. The invention can autonomously develop Green Energy linked ESG Assets on-demand like ChatGPT. He has become the first Indian geoscientist turned tech founder in the world to receive Ph.D.
There is a new reference ରେବି framework discussed in ଓଡ଼ିଆ AI ML , for those who know what “ଲୋ ରେବି! ଲୋ ନିଆଁ ! ଲୋ ଚୁଲି !” is . “It can tell ରେବତୀ her story is honored and convey our regards to Fakir Mohan Senapati by this ରେବି framework In ML!”
Industry News
GPT-4
GPT-4: OpenAI launched GPT-4. This is the company's most advanced system to date. This latest system has a broader general knowledge base and enhanced problem-solving abilities, enabling it to tackle even the most challenging problems more accurately. Moreover, GPT-4 is more collaborative and creative than its predecessors, as it can assist users in generating, editing, and iterating on creative and technical writing tasks, such as song composition, screenplay writing, or adapting to a user's writing style.
GPT-4 outperforms ChatGPT by scoring in higher approximate percentiles among test-takers.
It’s also performing better on various SOTA models on NLP tasks.
Here is a guide on different ways to access GPT-4.
ChatGPT
The diagram below shows how chatGPT has come out of nowhere. [source]
LLaMA on your laptop
You can now set up and run Meta’s GPT-3 equivalent (not ChatGPT) open-source AI model, LLaMA, on your local machine.
Check this repo for details: cocktailpeanut/dalai: The simplest way to run LLaMA on your local machine (github.com)
Other new releases
TogetherCompute has released OpenChatKit, a 20 Billion parameters open source chatGPT alternate model under the Apache-2.0 license, which you can try on Hugging face.
Google released its Universal Speech Model (USM) for 100+ languages. Unfortunately, Odia was not specified anywhere. However, Odia will be there as they have used the Common Voice dataset.
Microsoft announced Visual ChatGPT, which connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.
Anthropic launched Claude, an AI assistant (check the last edition for more details), on 14th March 2023. Claude is behind Duck Assist, Notion AI, and Quora Poe. Currently, Anthropic released two variants, Claude and Claude Instant. Claude is a SOTA high-performance model, while Claude Instant is a lighter, less expensive, and much faster option.
Google has announced the PaLM API and MakerSuite, two new tools that aim to make generative AI more accessible to developers. The PaLM API provides access to Google's LLMs. MakerSuite simplifies the generative AI development workflow by allowing developers to iterate on prompts, augment datasets with synthetic data, and tune custom models.
ViperGPT is a framework that uses code-generation models to compose vision-and-language models into subroutines to produce a result for any query. This approach requires no further training and achieves state-of-the-art results across various complex visual tasks.
AI Tools
OpenPlayground: Compare the output of ChatGPT, Anthropic's Claude, and Cohere's LLM in a single playground.
Talk to Books: Get your favorite passages or quotes from books.
Transvribe: This is a good aid for people learning on YouTube. Paste the youtube link and ask questions in text format. It will reply based on the video content.
Vocal Remover and Isolation: It splits the audio into vocals and music. Create Karaoke songs from any audio. No sign-up is required. Good to impress your ❤️.
Follow us on social media platforms to get the latest updates. Feedbacks are always welcome. Goodbye and take care, until the following newsletter…. ଜୟ ଜଗନ୍ନାଥ 🙏🏼
Interesting articles
The Physics Principle That Inspired Modern AI Art | WIRED: Describes the origin of GAN, DDPM, and recent DALL·E 2, Stable Diffusion models.
Jailbreak ChatGPT: Prompts that force chatGPT to go beyond restrictions imposed
From structured search to learning-to-rank-and-retrieve: How Amazon uses reinforcement learning to improve candidate selection and ranking for search, ad platforms, and recommender systems.
ControlNet in 🧨 Diffusers (hugging face): ControlNet allows users to customize the generation process of Stable Diffusion models with depth maps, segmentation maps, scribbles, and critical points.
Generative AI is overrated, long live old-school AI | Encord: The article argues that while generative AI has captured the public's imagination, predictive AI remains crucial for solving real-world challenges and unleashing AI's true potential. The author suggests that both types of AI must be merged to accelerate the AI revolution and transform our world.