Using Google bq
CLI, the following command allows to get the top Pypi keywords from the bigquery-public-data.pypi.distribution_metadata
table:
bq query --use_legacy_sql=false 'SELECT keyword, COUNT(*) as keyword_count FROM `bigquery-public-data.pypi.distribution_metadata`, UNNEST(SPLIT(keywords, ", ")) as keyword GROUP BY keyword ORDER BY keyword_count DESC LIMIT 100'
Result for the top-15 keywords:
python
: 128555 appearances
DuckDB Database SQL OLAP
: 70739 appearances
ai
: 64997 appearances
tensorflow tensor machine learning
: 51144 appearances
pulumi
: 50076 appearances
api
: 47986 appearances
probabilities probabilistic-graphical-models inference diagnosis
: 46552 appearances
rust
: 45607 appearances
cli
: 39512 appearances
OpenAPI
: 38814 appearances
sdk
: 38060 appearances
llm
: 37487 appearances
OpenAPI-Generator
: 36734 appearances
database
: 35578 appearances
automation
: 34393 appearances
Note that this is a very basic query, that does take into account that some packages have a lot more versions published on Pypi than others.
—
Permalink