Python Data types

THIS PAGE IS STILL UNDER DEVELOPMENT

Python offers usual data types in Programming as integer, real, booleans, string but it also deals with more advanced data structures as collections (enumerations, lists) and structured objects (Dictionaries, classes)

Python variables do not need explicit declaration to reserve memory space. Based on each variable’s assignment, the declaration happens automatically.

The equal sign (=) is used to assign values to variables. The operand to the left of the = operator is the name of the variable and the operand to the right of the = operator is the value stored in the variable.

string_example =  "Hello, how are you ?"
print(string_example)  
Hello, how are you ?

Here the name of the variable is string_example and its value is "Hello, how are you ?"

Strings

Strings can simply be defined by using single ( ' ), double ( " ) or triple ( ''' ) quotes .

Notice that strings enclosed in tripe quotes ( ''' ) can span over multiple lines and are used frequently in docstrings (Python’s way of documenting functions):

string_example = '''Hello! 
How are you?'''
print(string_example) 
Hello! 
How are you?

Accessing characters in strings

string_example = "Hello!"
print(string_example[0]) 
print(string_example[2]) 
H
l

As expected, string_example[0] gets the 1st character and string_example[2] gets the 3rd character of the string Hello! and returns 'l'

Python strings are immutable, so you cannot change part of strings:

string_example="hello"
string_example[2]='j'
# Returns an error
Traceback (most recent call last):
  File "<string>", line 2, in <module>
TypeError: 'str' object does not support item assignment

We cannot assign string_example[2] with another value : strings are immutable.

string_example="hello"
print(string_example[-1])
print(string_example[-5])
o
h

Special characters

string2='Today is a beautiful day. \n The sky is blue.'
print(string2)
Today is a beautiful day. 
 The sky is blue.

Note that \n needs to be escaped in order to not be interpreted as a new line: \ is used as an escape character.

print('\\n is escaped to appear in its litteral form otherwise it will add a new line: \n New line!')
\n is escaped to appear in its litteral form otherwise it will add a new line: 
 New line!

By the way, to print the litteral form of the \ character in Python you need to escape it: print(\\)

Number

Integers, floating point numbers and complex numbers falls under Python numbers category. They are defined as int, float and complex class in Python.

The type() function returns the class of a variable. The isinstance() function checks if an object belongs to a particular class.

int (signed integers)

int is an unlimited integer. Before version 3.0, this type was called long, and the int type was a 32- or 64-bit integer. Nevertheless, automatic conversion avoided any overflow.

long (long integers, they can also be represented in octal and hexadecimal)

float (floating point real values)

Float is a float equivalent to the double type of C, which is a number between -1.7 × 10308 and 1.7 × 10308 on the platforms in accordance with the IEEE 754.

complex (complex numbers)

## Digital objects

Data collections

List datatype

You can define a list by writing a list of comma separated values in square brackets. Lists are mutable (dynamic array). They automatically extend their size when needed.

fruits_list = ['apple', 'banana', 'orange']
print(fruits_list)
['apple', 'banana', 'orange']

Lists might contain items of different types as for example :

mixt_list = [1, 'banana', 45.7]
print(mixt_list)
[1, 'banana', 45.7]

Python lists are mutable and individual elements of a list can be changed.

mixt_list = [1, 'banana', 'orange', 'apple', 45.7]
mixt_list[0]='mango'
print(mixt_list)
['mango', 'banana', 'orange', 'apple', 45.7]

A negative index accesses the list from the end.

mixt_list = [1, 'banana', 'orange', 'apple', 45.7]
print(mixt_list[0])
print(mixt_list[-2])
1
apple

A range of lists can be accessed with first index and last index.

mixt_list = [1, 'banana', 'orange', 'apple', 45.7]
print(mixt_list[1:3])
['banana', 'orange']

Useful Methods for list datatype variables

From https://docs.python.org/3.1/tutorial/datastructures.html

Method Action
list.append(x) adds an item to the end of the list;
list.extend(L) extends the list by appending all the items in the given list
list.insert(i, item) inserts item in list1 before the element which index is i
list.remove(x) removes the first element from the list whose value is x. It is an error if there is no such item.
list.pop(i) removes the item at the given position in the list, and returns i
list.popleft() removes the first item in the list and returns it. Useful to implement a queue: first-in, first-out.
list.pop() removes and returns the last item in the list
list.index(x) returns the index in the list of the first item whose value is x or an error if there is no such item.
list.count(x) return the number of times x appears in the list.
list.sort() sorts the items of the list, in place.
list.reverse() reverses the elements of the list, in place.

List methods examples

Let's use the append method to add an element to a list object:

fruits_list = ['apple', 'banana', 'orange']
fruits_list.append('tangerine')
print(fruits_list)
['apple', 'banana', 'orange', 'tangerine']

Another way to add an element:

fruits_list = ['apple', 'banana', 'orange']
# len(fruits_list) returns fruits_list size here 3. 
# Index 3 stands for the fourth element (index 0 for the first)
fruits_list[len(fruits_list):]=['tangerine']
print(fruits_list)
['apple', 'banana', 'orange', 'tangerine']

Let's use the insert method to insert an element to the list :

fruits_list = ['apple', 'banana', 'orange']
fruits_list.insert(1,'tangerine')
# inserts an element before 'banana' which index is 1
print(fruits_list)
['apple', 'tangerine', 'banana', 'orange']

fruits_list.insert(0, 'pear') inserts 'pear' at the front of the list

fruits_list = ['apple', 'banana', 'orange']
fruits_list.insert(0, 'pear')
print(fruits_list)
['pear', 'apple', 'banana', 'orange']

fruits_list.insert(len(fruits_list), 'pear') is equivalent to fruits_list.append('pear').

fruits_list = ['apple', 'banana', 'orange']
fruits_list.insert(len(fruits_list), 'pear')
print(fruits_list)
['apple', 'banana', 'orange', 'pear']

Let's use the extend method to add another list to the current one

fruits_list = ['apple', 'banana', 'orange']
mixt_list = [1, 'banana', 'orange', 'apple', 45.7]
fruits_list.extend(mixt_list)
print(fruits_list)
['apple', 'banana', 'orange', 1, 'banana', 'orange', 'apple', 45.7]

List comprehension

Python supports a concept called "list comprehensions". It can be used to make lists from an expression using for clause. List comprehensions simplify the code and make it more readable, faster to write and easier to maintain.

To execute a function on each item of the list :

new_list = [function(x) for x in list]

Let's display multiples of 2:

multiple = [x*2 for x in range(10)]
print(multiple)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

this is shorter than and equivalent to :

multiple = []
for x in range(10):
    multiple.append(x*2)
print(multiple)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

To filter the elements of a list

Filter syntax:

new_list = [function(x) for x in list if condition(x)]

We want to filter numbers ​​of a list and only keep the even ones:

list1 = [1,4,2,7,1,9,0,3,4,6,6,6,8,3]
list2=[x for x in list1 if (x%2) == 0]
print(list2)
[4, 2, 0, 4, 6, 6, 6, 8]

this is equivalent to :

list1 = [1,4,2,7,1,9,0,3,4,6,6,6,8,3]
list2=[]
for x in list1:
    if (x%2) == 0:
        list2.append(x)
print(list2)
[4, 2, 0, 4, 6, 6, 6, 8]

Tuple datatype

A tuple is an ordered sequence of items same as a list but it is immutable. As a list, it can contain items of different types. Tuples once created cannot be modified.

Tuples are usually used to write-protect data.

You can define a tuple by writing a number of values separated by commas.

tuple1= 2, 4, 6, 8, 10
tuple1

Tuples are immutable and the output is surrounded by parentheses so that nested tuples are processed correctly.

tuple1= 2, 4, 6, 8, 10
tuple2= tuple1, 'hello', '44.6'
print(tuple2)
((2, 4, 6, 8, 10), 'hello', '44.6')

If we try to modify Tuples, we get an error:

tuple1= 2, 4, 6, 8, 10
print(tuple1[0])
print("tuple1[0] cannot be assigned, WE GET AN ERROR: \n")
tuple1[0]="you can't change me"
2
tuple1[0] cannot be assigned, WE GET AN ERROR: 

Traceback (most recent call last):
  File "<string>", line 4, in <module>
TypeError: 'tuple' object does not support item assignment

Since Tuples are immutable and can not be changed, they are processed faster than lists. If our list is unlikely to change, we should use the tuple type instead of the list type.

Set datatype

A set is an unordered collection with NO DUPLICATE elements.

To create an empty set you have to use set(), not {}; Testing if an element is part of the set and eliminating duplicate entries:

set_animals = {'dog', 'cat', 'pinguin', 'bear', 'deer', 'mouse'}
# TESTING if 'dog' is part of the set returns True or False:
print('dog' in set_animals) 
print('zebra' in set_animals)
True
False
set1 = set('arnaldur')
print(set1)
set(['a', 'd', 'l', 'n', 'r', 'u'])

Each letter is an element of the set and is unique.

Set mathematical operations

Let's study two sets of characters made from 'arnaldur' and 'erlendur':

set1 = set('arnaldur')
set2 = set('erlendur')
# Let's print letters in set2 but not in set1
print(set2 - set1)
set(['e'])

Performance test

DIFFERENCE OF TWO SETS : examples and explanations from (http://stackoverflow.com/questions/3462143/get-difference-between-two-lists)[http://stackoverflow.com/questions/3462143/get-difference-between-two-lists]

import timeit
init = 'temp1 = list(range(100)); temp2 = [i * 2 for i in range(50)]'

print timeit.timeit('list(set(temp1) - set(temp2))', init, number = 100000)

print timeit.timeit('s = set(temp2);[x for x in temp1 if x not in s]', init, number = 100000)

print timeit.timeit('[item for item in temp1 if item not in temp2]', init, number = 100000)
1.90468907356
1.77592396736
9.1676170826

Results:

4.34620224079 # ars' answer 4.2770634955 # This answer 30.7715615392 # matt b's answer

"The method I presented as well as preserving order is also (slightly) faster than the set subtraction because it doesn't require construction of an unnecessary set. The performance difference would be more noticable if the first list is considerably longer than the second and if hashing is expensive. Here's a second test demonstrating this" :

init = ''' temp1 = [str(i) for i in range(100000)] temp2 = [str(i * 2) for i in range(50)]
'''

Results:

11.3836875916 # ars' answer 3.63890368748 # this answer (3 times faster!) 37.7445402279 # matt b's answer

Dictionary

Dictionary is an unordered set of key and value pairs. Keys are unique while values may not be.

The values of a dictionary can be of any type

the keys must be of an immutable data type such as strings, numbers, or tuples.

A pair of braces creates an empty dictionary: {}.

dict_example = {'Name': 'Inuk', 'Age': '10', 'Country': 'Alaska'}

print ("dict_example['Name']: "+ dict_example['Name'])
print ("dict_example['Age']: "+ dict_example['Age'])
print ("dict_example['Country']: "+ dict_example['Country'])
dict_example['Name']: Inuk
dict_example['Age']: 10
dict_example['Country']: Alaska

Numpy arrays

The Numpy array is the most important data structure for scientific computing in Python. Numpy arrays operations are similar to Lists'operations.

The Numpy array vs the Python list : From http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists

"Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value -- and the memory allocators rounds up to 16)."

"A NumPy array is an array of uniform values -- single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes. Less flexible, but computation is faster than standard Python lists!""

"NumPy is not just more efficient; it is also more convenient. You get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented."

"Also, many useful libraries work with NumPy arrays. For example, statistical analysis and visualisation libraries."

"You get a lot built in with NumPy, FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc."

import numpy as np
import timeit as ti

Nelements = 10000
Ntimeits = 10000

x = np.arange(Nelements)
y = range(Nelements)

t_numpy = ti.Timer("x.sum()", "from __main__ import x")
t_list = ti.Timer("sum(y)", "from __main__ import y")
print ("numpy: %.3e" % (t_numpy.timeit(Ntimeits)/Ntimeits,))
print ("list:  %.3e" % (t_list.timeit(Ntimeits)/Ntimeits,))
numpy: 6.524e-05
list:  2.812e-04