ARM is a 32-Bit, little-endian, architecture and the CPU has 32 registers. Some of them are general purpose and some have a prefedined use.
To access ROM, RAM and perihpal components the CPU uses it's address lines. Because of the 32 Bit architecture it can the range 0x0000 0000 to 0xFFFF FFFF, which makes 4GB in size. This range is sliced into functional blocks, for example the ROM-code of an external NOR-flash is mapped to the address range 0x0000 0000 to 0x1FFF FFFF, then SDRAM to 0x8000 0000 to 0xEFFF FFFF, internal SRAM to 0x2000 0000 to 0x2000 FFFF, IO-Ports registers to 0xFFFF 0000 to 0xFFFF FFFF, and so forth.
The "program counter" is reflected in the "PC"-Register. This register holds the memory address of the next command to execute. After a reset it is initialized with "0x0000 0000". The PC increments automatically (well in fact the CPU do it) after execution of a command. But it can also be changed by software and this is usually the case if the program branches to a subroutine. The Mnemonic "B <addr>" ("B" stands for "branch" and is something like "JMP" which means "jump" in other Assembler languages) and pushes the <addr> into the PC register. The command omitts the autoincrement of the PC, like other commands do. What happened is, that execution resumes at the named address. If the program needs to return to the command after the branch-jump when the subroutine finishes, it needs to save those address somewhere. Usually this is the job of a "stack", because calls could be interleaved. The other option is to store the return-address into a general purpose register. I expect to find a "set PC to content of register X" as the last command in an subroutine, like a "ret" in other Assembler languages.
Other CPUs have a register called "Stack Pointer" (SP) for this case. In ARM architecture this is called "LR" (link register) and is of the same functionallity. Because the "stack" itself is dynamic, it may of no use WHERE in memory this register points to, but WHAT is stored on this memory location. A stack works upside down, or LIFO (last-in-first-out). Everytime a word (32 Bit) is pushed onto the stack, the SP is decremented by 4 bytes. When there is no more space left, it will result in an stack overrun error, which may be detected by something like an MMU (memory-management-unit). If a value is poped from the stack, the SP will get incremented by 4 bytes, so it points to the value stored before the one retrieved. For shure the SP can also be manipulated by commands directly.
The "CPSR" register is also very important, because it contains the status flags of the CPU. CPSR stands for "current program status register" (read all the nifty details here: http://infocenter.arm.com/help/index.js ... FAEID.html) and holds results of previous commands executed. For example if there was an compare operation, which should proof if the content of an register is equal to a fixed value, the Z-flag (Zero-Flag) is set to '1' if they match and set to '0' if they don't.