Using Google bq CLI, the following command allows to get the top Pypi keywords from the bigquery-public-data.pypi.distribution_metadata table:
bq query --use_legacy_sql=false 'SELECT keyword, COUNT(*) as keyword_count FROM `bigquery-public-data.pypi.distribution_metadata`, UNNEST(SPLIT(keywords, ", ")) as keyword GROUP BY keyword ORDER BY keyword_count DESC LIMIT 100'
Result for the top-15 keywords:
- python: 128555 appearances
- DuckDB Database SQL OLAP: 70739 appearances
- ai: 64997 appearances
- tensorflow tensor machine learning: 51144 appearances
- pulumi: 50076 appearances
- api: 47986 appearances
- probabilities probabilistic-graphical-models inference diagnosis: 46552 appearances
- rust: 45607 appearances
- cli: 39512 appearances
- OpenAPI: 38814 appearances
- sdk: 38060 appearances
- llm: 37487 appearances
- OpenAPI-Generator: 36734 appearances
- database: 35578 appearances
- automation: 34393 appearances
Note that this is a very basic query, that does take into account that some packages have a lot more versions published on Pypi than others.
— 
Permalien