Robin on Linux

↧

Image may be NSFW.
Clik here to view.

Use matplotlib to draw multiple pictures

July 23, 2020, 5:51 am

To draw some sample images in one page, I used one of the most popular python library matplotlib: import cv2 import matplotlib.pyplot as plt img = cv2.imread("bird.jpg") img = cv2.cvtColor(img,...

View Article

Image may be NSFW.
Clik here to view.

“Show” the sound of a bird

July 29, 2020, 6:48 pm

Seems librosa is a really popular python library for audio processing. By using librosa, I can show the MFCC of the sound from a bird by just some simple lines of python: import librosa import...

View Article

Image may be NSFW.
Clik here to view.

In the paper Attention Is All You Need, the Transformer neural network had been introduced for the first time in 2017. One year later, the BERT appeared. And last year I gave a simple presentation in...

View Article

Transfer Redshift SQL to BigQuery SQL

August 19, 2020, 9:22 pm

In my recent work, I need to run some SQL snippet from Redshift on Google’s BigQuery platform. Since different data warehouses have a different recipe for SQL, the transferring work couldn’t be...

View Article

Build a Python module for MFCC’s C++ implementation

September 3, 2020, 6:29 pm

MFCC means Mel-frequency cepstral coefficients. It’s a powerful feature representation for sound. Although there is a lot of implementations in different programming language for MFCC, they give...

View Article

Some tips about pandas, again

September 9, 2020, 8:55 pm

pd.merge() may change the names of original columns: import pandas as pd df1 = pd.DataFrame(data={"name": ["robin", "hood"], "age": [40, 30]}) df2 = pd.DataFrame(data={"name": ["lion", "heart"], "age":...

View Article

Compare two tables in BigQuery

September 17, 2020, 2:38 am

As this answer, the best solution for comparing two tables in BigQuery is: ( SELECT * FROM table1 EXCEPT DISTINCT SELECT * from table2 ) UNION ALL ( SELECT * FROM table2 EXCEPT DISTINCT SELECT * from...

View Article

Efficient reading in pandas

September 23, 2020, 9:25 pm

My previous code was trying to read all data and get only one column that I need: import pandas as pd df = pd.read_csv("data.csv")["card_id"] In the test environment, this program cost more than 10GB...

View Article

Some tips for using bash and Jsonnet

September 30, 2020, 9:37 pm

To build an integration test by using Argo, I used bash to run scripts and Jsonnet to orchestrate Argo workflow. As expected, I met a bunch of problems with them. problem about base64 [default]...

View Article

TabNet: a new neural-network architecture for tabular data

October 8, 2020, 2:36 am

The neural network seems mostly to be used on Computer Vision and Natural Language Processing scenarios, while tree-models like GBDT are mainly used for tabular data. But why? Although this article...

View Article

Image may be NSFW.
Clik here to view.

The awesome YOLOv5

October 22, 2020, 4:56 am

I just found a repository YOLOv5 from Github. It’s not just the models are accurate and fast but also easy to get and to use. Just download the code, install some dependent libraries. And you can just...

View Article

Separate pip environment

October 29, 2020, 5:26 pm

Days ago, I thought anaconda could separate different python environment, so I created a new conda environment to install new version pip package. And no doubt, it failed. The fact is: conda could only...

View Article

Image may be NSFW.
Clik here to view.

Attach references on arXiv.org

November 5, 2020, 8:57 pm

About one month ago I and my old colleague Jian Mei had published a paper about ornithology images on arXiv.org. But two days ago, Jian Mei found out that I have written the number of images and...

View Article

Export YOLOv5 models for mobile device

November 12, 2020, 4:37 pm

Somebody has finished the work about exporting YOLOv5 models to tflite model. To use it, we only need to: git clone --single-branch --branch tf-export https://github.com/zldrobit/yolov5.git cd yolov5 #...

View Article

Get the schema of a parquet file

November 25, 2020, 3:56 pm

Previously I just use this snippet to get all the column names of a parquet file: import pandas as pd df = pd.read_parquet("hello.parquet") print(list(df.columns)) But if the parquet file is very large...

View Article

Directly update git commit

December 10, 2020, 3:41 pm

Previously when I want to update my branch of git, I just commit new modifications again and again. This makes my branch log quite ugly. After my colleague introduced git --amend to me, he really helps...

View Article

A struggle to keep the accuracy

December 17, 2020, 3:25 pm

In this August, we have got 0.83 evaluation accuracy for DIB-10K dataset. But since last month, we have updated the dataset and the accuracy could only get to 0.82. The first doubtful point is the...

View Article

Image may be NSFW.
Clik here to view.

Run cron jobs in Kubernetes

December 23, 2020, 9:01 pm

In the old age of a single machine, we use /etc/crontab to schedule the job. But in the new age of the distributed system, especially Kubernetes, what should we use? The first choice in my mind is...

View Article

Image may be NSFW.
Clik here to view.

Books I read in the year 2020

December 29, 2020, 4:34 pm

Since beginning my new job on 6th January, I start to learn Kubernetes for the first time. Frankly speaking, Kubernetes is very powerful and also easy to use (from my perspective). And the book...

View Article

To solve the problem about pivot() of Pandas

January 7, 2021, 4:58 pm

Below is an example from pandas official document for pivot(): import pandas as pd df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'], 'bar': ['A', 'B', 'C', 'A', 'B', 'C'], 'baz':...

View Article