Quantcast
Browsing all 236 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Use matplotlib to draw multiple pictures

To draw some sample images in one page, I used one of the most popular python library matplotlib: import cv2 import matplotlib.pyplot as plt img = cv2.imread("bird.jpg") img = cv2.cvtColor(img,...

View Article


Image may be NSFW.
Clik here to view.

“Show” the sound of a bird

Seems librosa is a really popular python library for audio processing. By using librosa, I can show the MFCC of the sound from a bird by just some simple lines of python: import librosa import...

View Article


Image may be NSFW.
Clik here to view.

Understanding Transformer

In the paper Attention Is All You Need, the Transformer neural network had been introduced for the first time in 2017. One year later, the BERT appeared. And last year I gave a simple presentation in...

View Article

Transfer Redshift SQL to BigQuery SQL

In my recent work, I need to run some SQL snippet from Redshift on Google’s BigQuery platform. Since different data warehouses have a different recipe for SQL, the transferring work couldn’t be...

View Article

Build a Python module for MFCC’s C++ implementation

MFCC means Mel-frequency cepstral coefficients. It’s a powerful feature representation for sound. Although there is a lot of implementations in different programming language for MFCC, they give...

View Article


Some tips about pandas, again

pd.merge() may change the names of original columns: import pandas as pd df1 = pd.DataFrame(data={"name": ["robin", "hood"], "age": [40, 30]}) df2 = pd.DataFrame(data={"name": ["lion", "heart"], "age":...

View Article

Compare two tables in BigQuery

As this answer, the best solution for comparing two tables in BigQuery is: ( SELECT * FROM table1 EXCEPT DISTINCT SELECT * from table2 ) UNION ALL ( SELECT * FROM table2 EXCEPT DISTINCT SELECT * from...

View Article

Efficient reading in pandas

My previous code was trying to read all data and get only one column that I need: import pandas as pd df = pd.read_csv("data.csv")["card_id"] In the test environment, this program cost more than 10GB...

View Article


Some tips for using bash and Jsonnet

To build an integration test by using Argo, I used bash to run scripts and Jsonnet to orchestrate Argo workflow. As expected, I met a bunch of problems with them. problem about base64 [default]...

View Article


TabNet: a new neural-network architecture for tabular data

The neural network seems mostly to be used on Computer Vision and Natural Language Processing scenarios, while tree-models like GBDT are mainly used for tabular data. But why? Although this article...

View Article

Image may be NSFW.
Clik here to view.

The awesome YOLOv5

I just found a repository YOLOv5 from Github. It’s not just the models are accurate and fast but also easy to get and to use. Just download the code, install some dependent libraries. And you can just...

View Article

Separate pip environment

Days ago, I thought anaconda could separate different python environment, so I created a new conda environment to install new version pip package. And no doubt, it failed. The fact is: conda could only...

View Article

Image may be NSFW.
Clik here to view.

Attach references on arXiv.org

About one month ago I and my old colleague Jian Mei had published a paper about ornithology images on arXiv.org. But two days ago, Jian Mei found out that I have written the number of images and...

View Article


Export YOLOv5 models for mobile device

Somebody has finished the work about exporting YOLOv5 models to tflite model. To use it, we only need to: git clone --single-branch --branch tf-export https://github.com/zldrobit/yolov5.git cd yolov5 #...

View Article

Get the schema of a parquet file

Previously I just use this snippet to get all the column names of a parquet file: import pandas as pd df = pd.read_parquet("hello.parquet") print(list(df.columns)) But if the parquet file is very large...

View Article


Directly update git commit

Previously when I want to update my branch of git, I just commit new modifications again and again. This makes my branch log quite ugly. After my colleague introduced git --amend to me, he really helps...

View Article

A struggle to keep the accuracy

In this August, we have got 0.83 evaluation accuracy for DIB-10K dataset. But since last month, we have updated the dataset and the accuracy could only get to 0.82. The first doubtful point is the...

View Article


Image may be NSFW.
Clik here to view.

Run cron jobs in Kubernetes

In the old age of a single machine, we use /etc/crontab to schedule the job. But in the new age of the distributed system, especially Kubernetes, what should we use? The first choice in my mind is...

View Article

Image may be NSFW.
Clik here to view.

Books I read in the year 2020

Since beginning my new job on 6th January, I start to learn Kubernetes for the first time. Frankly speaking, Kubernetes is very powerful and also easy to use (from my perspective). And the book...

View Article

To solve the problem about pivot() of Pandas

Below is an example from pandas official document for pivot(): import pandas as pd df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'], 'bar': ['A', 'B', 'C', 'A', 'B', 'C'], 'baz':...

View Article
Browsing all 236 articles
Browse latest View live