Does sinusoid Positional Embeddings actually work well?
The GPT part of my Multimodal trials mainly comes from nanoGPT. In the nanoGPT, the Positional Encoding is just a learnable tensor (“wpe” means “weights of positional embedding”): self.transformer =...
View Article“No space left on device” problem I met on ext4
After downloading the whole CC12M dataset from Huggingface, I wrote a tool to extract all of the image-text-pair files into one directory. But after extracting 17 million (17681010 exactly) files, the...
View ArticleMultimodal trials: my tiny CLIP implementation (episode 2)
Three weeks passed since the previous article. Here are the answers to the previous three questions: Q1: The original paper of CLIP uses L2 normalization on multimodal embeddings. But in my test, the...
View ArticleMultimodal trials: solve the Masked Language problem about my tiny ALBEF...
I just wrote my implementation of ALBEF in my own way. But when evaluated with some masked sentences, it failed. I am using this image: When I asked “This is a chocolate <|mask|>”, it generated...
View ArticleWe are supposed to learn like the Large Language Model
When joined the Junior Middle school in a small town in west-south China in 1993, I met my first English book. Yes, it looks exactly like this: Then, the terrible 6 years of...
View ArticleStrange problem in Python Client of Vertex AI
To create a pipeline schedule of Vertex AI, we can use below snippet: from google.cloud import aiplatform pipeline_job = aiplatform.PipelineJob( template_path="COMPILED_PIPELINE_PATH",...
View ArticleAugmentation helps ALBEF a lot
I was trying to implement ALBEF by myself for practice. After finishing all the parts (Vision part, BERT part, including Masked Language Model), I trained the model on...
View ArticleNotes and experiences from Audio Classification research
All the code is here. The baseline of training balanced data of AudioSet is 0.27 mAP. Using TimeMasking and FrequentMasking could slightly push it to 0.28 mAP. I tried mixup of raw sounds like AST but...
View ArticleHow to unfold two Arrays in BigQuery
Imaing we have data like this: WITH Sequences AS (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS prod_type, [1.1, 1.2, 2.1, 2.3, 3.3, 3.4] AS prod_price, UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS...
View ArticleAn experiment about my stupid idea
After training both image classification and sound classification deep learning models. I found out that the image training is much slower than the sound training, although the sound dataset is much...
View ArticleFix the launching problem of Android Studio on Macbook
After I installed two versions of JDK (17 and 21) and uninstalled them, I saw this error when trying to launch my Android Studio. This error is hard to fix. Reinstalling Android Studio won’t fix it....
View ArticleResizing a image is not as easy as you think
I found a very interesting picture: The size of this image is about 8MB although it’s blurring. Then I use below python code to try to resize it using different interpolation strategy: import cv2 img...
View ArticleHow to change App name in Google Play Console?
Just one picture: I don’t know why the editing place of App name is under “Grow users”. But unfortunately, it’s there. After you change the “App name” and click “Save” (You also need to upload a bunch...
View ArticleTry to understand Variational Autoencoders
ELBO as the Loss Function Note: “p(x|z)” means True Posterior, “q(z|x)” means Approximate Posterior What if only use first term of the Loss? What’s the meaning of “GAN tend to lack full support over...
View ArticleExperiments about ‘torchao’
‘torchao‘ is a python library that support PyTorch native quantization and sparsity for training and inference. I just finished some experiments/tests with it for my image-classification project,...
View ArticleExperiments about ‘accelerate’ library of HuggingFace
If you want to run your training code with ‘accelerate‘ fp8, you need to install ‘transformer_engine‘ or ‘MS-AMP‘. But these two packages are hard to install beccause they depends on specific...
View Article