Use matplotlib to draw multiple pictures
To draw some sample images in one page, I used one of the most popular python library matplotlib: import cv2 import matplotlib.pyplot as plt img = cv2.imread("bird.jpg") img = cv2.cvtColor(img,...
View Article“Show” the sound of a bird
Seems librosa is a really popular python library for audio processing. By using librosa, I can show the MFCC of the sound from a bird by just some simple lines of python: import librosa import...
View ArticleUnderstanding Transformer
In the paper Attention Is All You Need, the Transformer neural network had been introduced for the first time in 2017. One year later, the BERT appeared. And last year I gave a simple presentation in...
View ArticleTransfer Redshift SQL to BigQuery SQL
In my recent work, I need to run some SQL snippet from Redshift on Google’s BigQuery platform. Since different data warehouses have a different recipe for SQL, the transferring work couldn’t be...
View ArticleBuild a Python module for MFCC’s C++ implementation
MFCC means Mel-frequency cepstral coefficients. It’s a powerful feature representation for sound. Although there is a lot of implementations in different programming language for MFCC, they give...
View ArticleSome tips about pandas, again
pd.merge() may change the names of original columns: import pandas as pd df1 = pd.DataFrame(data={"name": ["robin", "hood"], "age": [40, 30]}) df2 = pd.DataFrame(data={"name": ["lion", "heart"], "age":...
View ArticleCompare two tables in BigQuery
As this answer, the best solution for comparing two tables in BigQuery is: ( SELECT * FROM table1 EXCEPT DISTINCT SELECT * from table2 ) UNION ALL ( SELECT * FROM table2 EXCEPT DISTINCT SELECT * from...
View ArticleEfficient reading in pandas
My previous code was trying to read all data and get only one column that I need: import pandas as pd df = pd.read_csv("data.csv")["card_id"] In the test environment, this program cost more than 10GB...
View ArticleSome tips for using bash and Jsonnet
To build an integration test by using Argo, I used bash to run scripts and Jsonnet to orchestrate Argo workflow. As expected, I met a bunch of problems with them. problem about base64 [default]...
View ArticleTabNet: a new neural-network architecture for tabular data
The neural network seems mostly to be used on Computer Vision and Natural Language Processing scenarios, while tree-models like GBDT are mainly used for tabular data. But why? Although this article...
View ArticleThe awesome YOLOv5
I just found a repository YOLOv5 from Github. It’s not just the models are accurate and fast but also easy to get and to use. Just download the code, install some dependent libraries. And you can just...
View ArticleSeparate pip environment
Days ago, I thought anaconda could separate different python environment, so I created a new conda environment to install new version pip package. And no doubt, it failed. The fact is: conda could only...
View ArticleAttach references on arXiv.org
About one month ago I and my old colleague Jian Mei had published a paper about ornithology images on arXiv.org. But two days ago, Jian Mei found out that I have written the number of images and...
View ArticleExport YOLOv5 models for mobile device
Somebody has finished the work about exporting YOLOv5 models to tflite model. To use it, we only need to: git clone --single-branch --branch tf-export https://github.com/zldrobit/yolov5.git cd yolov5 #...
View ArticleGet the schema of a parquet file
Previously I just use this snippet to get all the column names of a parquet file: import pandas as pd df = pd.read_parquet("hello.parquet") print(list(df.columns)) But if the parquet file is very large...
View ArticleDirectly update git commit
Previously when I want to update my branch of git, I just commit new modifications again and again. This makes my branch log quite ugly. After my colleague introduced git --amend to me, he really helps...
View ArticleA struggle to keep the accuracy
In this August, we have got 0.83 evaluation accuracy for DIB-10K dataset. But since last month, we have updated the dataset and the accuracy could only get to 0.82. The first doubtful point is the...
View ArticleRun cron jobs in Kubernetes
In the old age of a single machine, we use /etc/crontab to schedule the job. But in the new age of the distributed system, especially Kubernetes, what should we use? The first choice in my mind is...
View ArticleBooks I read in the year 2020
Since beginning my new job on 6th January, I start to learn Kubernetes for the first time. Frankly speaking, Kubernetes is very powerful and also easy to use (from my perspective). And the book...
View ArticleTo solve the problem about pivot() of Pandas
Below is an example from pandas official document for pivot(): import pandas as pd df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'], 'bar': ['A', 'B', 'C', 'A', 'B', 'C'], 'baz':...
View Article