PYTHON LIBRARIES

Due to its high performance nature, scientific computing in Python often utilizes external libraries, typically written in faster languages (like C, or FORTRAN for matrix operations).

How to import Python libraries

À verifier : performance There are several ways of doing so in Python: there is no performance difference between the different approaches.

  1. import math:
import math
print(math.factorial(10))
3628800
  1. import math as m:

m stands for math. The math library functions can be called using the library alias: m.factorial(number) instead of math.factorial(number).

import math as m
print(m.factorial(10))
3628800
  1. from math import *

The entire library name space is imported: you can directly use factorial() without referring to math.

from math import *
print(factorial(10))
3628800
from math import factorial
print(factorial(10))
3628800

Google recommends that you use first approach to import libraries (import math), as you will know where the functions come from while calling them (math.function_name() instead of `function_name())

Moreover, importing a function into the global namespace risks name collisions if you use two different libraries having the same function mame.

Scientist and Data Analysis Libraries :

from https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/

The four main libraries used for datascience computing are NumPy, SciPy, Matplotlib and Pandas.

LIBRARY FEATURES

NumPy (Numerical Python)

  • numerical computing with powerful numerical N-dimensional array object
  • useful linear algebra, Fourier transform, and random number capabilities
  • tools for integrating C/C++ and Fortran code

SciPy (Scientific Python)

  • built on Numpy : use of multidimensional arrays provided by the NumPy module
  • efficient numerical routines: optimization, regression, interpolation

Matplotlib

  • 2-D visualization, “publication-ready” plots: histograms, line plots, heat plots
  • similar to Matlab, allow Latex commands to add math to your plot

Pandas

  • powerful and flexible open source data analysis
  • for data munging and preparation
  • for structured data operations and manipulations
OTHER LIBRARIES FEATURES

Scikit Learn

  • for machine learning
  • Built on NumPy, SciPy and matplotlib
  • statistical modeling classification

Statsmodels

  • for statistical data visualization
  • to explore data
  • to estimate statistical models

Seaborn

  • for making attractive and informative statistical graphics
  • based on matplotlib

Bokeh

  • for creating interactive plots
  • dashboards and data applications on modern web-browsers (like D3.js)
  • high-performance interactivity over very large or streaming datasets

Blaze

  • to extend the capability of Numpy and Pandas to distributed and streaming datasets
  • used to access data from a multitude of sources (Bcolz, MongoDB, | | | SQLAlchemy, Apache Spark, PyTables, etc.)
  • to create effective visualizations and dashboards on huge chunks of data with Bokeh

Scrapy

  • for web crawling
  • useful framework for getting specific patterns of data
  • to dig through web-pages within a website to gather information

SymPy (symbolic computation)

  • Wide-ranging capabilities from basic symbolic arithmetic to calculus algebra, discrete mathematics and quantum physics
  • the capability of formatting the result of the computations as LaTeX code

Requests

for accessing the web, similar to the standard python library urllib2 but easier to code

os

for Operating system and file operations

networkx and igraph

for graph

regular expressions

to find patterns in text data

BeautifulSoup

for scrapping web, extract information from just a single webpage in a run