ASCII & Unicode both are character sets & both character sets (ASCII & Unicode) hold a list of characters with unique decimal numbers (code points). A= 65, B=66, C=67 etc.
ASCII stands for American Standards Codes for Information Interchange.
ASCII character set contains 128 characters. Each number from 0 to 127 represents a character.
These 128 ASCII characters covers all Numeric numbers from (0-9), English alphabets upper case (A-Z), English alphabets lower case (a-z) & some other non-alphanumeric characters (~, ! , @, #, $, %, ^, &, *, (, ), _, ~, -, <, >, ?, /, . Etc.)
Each character, mentioned above has it's own decimal value. For example, capital alphabets A-Z has the decimal value from 65 to 90, small alphabets a-z has their decimal value from 97-122.
ASCII defines 128 characters, which map to the numbers 0–127. To represent a character of this range, ASCII requires only 7 bit.
Since, in Computer Science, size of 1 byte equals to 8 bit. It means we can represent 0 to 255 characters using one byte. Though all of our characters have been covered in 7 bits & we are left with one more extra bit. To utilize this extra bit, Extended ASCII Characters comes into the picture.
The range of Extended ASCII Characters - 128 to 255. Click here to view the complete table of Extended ASCII characters.
There are lots of characters in the world, which may include various symbols, various language characters like Hindi, Urdu, Chinese, Arabic etc. Emoji characters that we currently use in social networking apps & a lot of other symbols which we might not even aware of.
Unicode defines 2^21 characters, which, similarly, map to numbers 0– 2^21. Though not all numbers are currently assigned. Some are free and some are reserved for future use.
It is said (As per Wikipedia), at present, Unicode defines 1,114,112 code positions. Almost 100,000 have been currently allocated & rest are free or reserved for future use.
Though, it's range is 2^21, but it doesn't mean that we require only 21 bits to represent a Unicode character. To represent a Unicode character, the computer system uses Encoding. Hence the size of a Unicode character may differ from one Encoding scheme to another.
- UTF-8 (1 Byte to 4 Byte)
- UTF-16 (2 Byte or 4 Byte)
- UTF-32 ( 4 Byte)
This link may help you regarding the size of Unicode characters.