Robin on Linux

↧

Image may be NSFW.
Clik here to view.

Does sinusoid Positional Embeddings actually work well?

March 12, 2024, 5:52 pm

The GPT part of my Multimodal trials mainly comes from nanoGPT. In the nanoGPT, the Positional Encoding is just a learnable tensor (“wpe” means “weights of positional embedding”): self.transformer =...

View Article

“No space left on device” problem I met on ext4

March 20, 2024, 2:43 am

After downloading the whole CC12M dataset from Huggingface, I wrote a tool to extract all of the image-text-pair files into one directory. But after extracting 17 million (17681010 exactly) files, the...

View Article

Multimodal trials: my tiny CLIP implementation (episode 2)

March 26, 2024, 5:03 pm

Three weeks passed since the previous article. Here are the answers to the previous three questions: Q1: The original paper of CLIP uses L2 normalization on multimodal embeddings. But in my test, the...

View Article

Image may be NSFW.
Clik here to view.

Multimodal trials: solve the Masked Language problem about my tiny ALBEF...

April 18, 2024, 9:57 pm

I just wrote my implementation of ALBEF in my own way. But when evaluated with some masked sentences, it failed. I am using this image: When I asked “This is a chocolate <|mask|>”, it generated...

View Article

Image may be NSFW.
Clik here to view.

We are supposed to learn like the Large Language Model

May 30, 2024, 9:25 pm

When joined the Junior Middle school in a small town in west-south China in 1993, I met my first English book. Yes, it looks exactly like this: Then, the terrible 6 years of...

View Article

Strange problem in Python Client of Vertex AI

June 6, 2024, 2:56 am

To create a pipeline schedule of Vertex AI, we can use below snippet: from google.cloud import aiplatform pipeline_job = aiplatform.PipelineJob( template_path="COMPILED_PIPELINE_PATH",...

View Article

Image may be NSFW.
Clik here to view.

Augmentation helps ALBEF a lot

June 27, 2024, 10:13 pm

I was trying to implement ALBEF by myself for practice. After finishing all the parts (Vision part, BERT part, including Masked Language Model), I trained the model on...

View Article

Notes and experiences from Audio Classification research

August 8, 2024, 10:42 pm

All the code is here. The baseline of training balanced data of AudioSet is 0.27 mAP. Using TimeMasking and FrequentMasking could slightly push it to 0.28 mAP. I tried mixup of raw sounds like AST but...

View Article

Image may be NSFW.
Clik here to view.

How to unfold two Arrays in BigQuery

September 14, 2024, 6:50 pm

Imaing we have data like this: WITH Sequences AS (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS prod_type, [1.1, 1.2, 2.1, 2.3, 3.3, 3.4] AS prod_price, UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS...

View Article

Image may be NSFW.
Clik here to view.

An experiment about my stupid idea

November 3, 2024, 7:20 pm

After training both image classification and sound classification deep learning models. I found out that the image training is much slower than the sound training, although the sound dataset is much...

View Article

Image may be NSFW.
Clik here to view.

Fix the launching problem of Android Studio on Macbook

December 5, 2024, 3:59 pm

After I installed two versions of JDK (17 and 21) and uninstalled them, I saw this error when trying to launch my Android Studio. This error is hard to fix. Reinstalling Android Studio won’t fix it....

View Article

Image may be NSFW.
Clik here to view.

Resizing a image is not as easy as you think

December 9, 2024, 12:31 am

I found a very interesting picture: The size of this image is about 8MB although it’s blurring. Then I use below python code to try to resize it using different interpolation strategy: import cv2 img...

View Article

Image may be NSFW.
Clik here to view.

How to change App name in Google Play Console?

December 16, 2024, 9:05 pm

Just one picture: I don’t know why the editing place of App name is under “Grow users”. But unfortunately, it’s there. After you change the “App name” and click “Save” (You also need to upload a bunch...

View Article

Image may be NSFW.
Clik here to view.

Try to understand Variational Autoencoders

January 28, 2025, 9:17 pm

ELBO as the Loss Function Note: “p(x|z)” means True Posterior, “q(z|x)” means Approximate Posterior What if only use first term of the Loss? What’s the meaning of “GAN tend to lack full support over...

View Article

Experiments about ‘torchao’

January 30, 2025, 9:32 pm

‘torchao‘ is a python library that support PyTorch native quantization and sparsity for training and inference. I just finished some experiments/tests with it for my image-classification project,...

View Article

Experiments about ‘accelerate’ library of HuggingFace

February 3, 2025, 4:28 pm

If you want to run your training code with ‘accelerate‘ fp8, you need to install ‘transformer_engine‘ or ‘MS-AMP‘. But these two packages are hard to install beccause they depends on specific...

View Article